asf_search is an open source Python package for searching SAR data archived at ASF. This notebook demonstrates how to search NISAR GCOV with asf_search, stream them directly over HTTPS or from S3 storage with s3fs, and load them into xarray data structures.
NISAR data products can be very large. It may be helpful to access the data directly from their S3 storage. This allows you to lazily load data into xarray data structures, subset, and perform operations on them with xarray, and then only save the data you need to memory. This avoids downloading many large products to a storage volume, only to subset and delete most of them. The caveat is that you must have enough RAM to hold your final subset and also run your workflow.
Overview¶
1. Prerequisites¶
| Prerequisite | Importance | Notes |
|---|---|---|
| The software environment for this cookbook must be installed | Necessary |
Rough Notebook Time Estimate: 10 minutes
2. Search for data¶
Use asf_search to find GCOV data.
2a. Perform an asf_search.search() to identify your desired product URLs¶
import os
import asf_search as asf
from datetime import datetime
from getpass import getpass
import warnings
warnings.filterwarnings(
"ignore",
message="Parsing dates involving a day of month without a year specified",
)
session = asf.ASFSession()
start_date = datetime(2025, 11, 22)
end_date = datetime(2026, 3, 5)
area_of_interest = "POLYGON((40.9131 12.3904,41.8891 12.3904,41.8891 13.2454,40.9131 13.2454,40.9131 12.3904))" # POINT or POLYGON as WKT (well-known-text)
pattern = r'^(?!.*QA_STATS).*'
opts=asf.ASFSearchOptions(**{
"maxResults": 250,
"intersectsWith": area_of_interest,
"flightDirection": "ASCENDING",
"start": start_date,
"end": end_date,
"processingLevel": [
"GCOV"
],
"dataset": [
"NISAR"
],
"productionConfiguration": [
"PR"
],
'session': session,
})
response = asf.search(opts=opts)
hdf5_urls = response.find_urls(extension='.h5', pattern=pattern, directAccess=False)
print(f"Found {len(hdf5_urls)} GCOV products:")
hdf5_urlsFound 7 GCOV products:
['https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05007_N_F_J_001/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05007_N_F_J_001.h5',
'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05009_N_F_J_001/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05009_N_F_J_001.h5',
'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05007_N_F_J_001/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05007_N_F_J_001.h5',
'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001.h5',
'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_007_172_A_008_2005_DHDH_A_20251216T024619_20251216T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_007_172_A_008_2005_DHDH_A_20251216T024619_20251216T024653_X05009_N_F_J_001.h5',
'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_008_172_A_008_2005_DHDH_A_20251228T024619_20251228T024654_X05009_N_F_J_001/NISAR_L2_PR_GCOV_008_172_A_008_2005_DHDH_A_20251228T024619_20251228T024654_X05009_N_F_J_001.h5',
'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_009_172_A_008_2005_DHDH_A_20260109T024620_20260109T024654_X05009_N_F_J_001/NISAR_L2_PR_GCOV_009_172_A_008_2005_DHDH_A_20260109T024620_20260109T024654_X05009_N_F_J_001.h5']2b. Retain only the URL for the most recent version of each product in the search results¶
Data is occasionally re-released with an updated version. Versions are recorded as a Composite Release Identifier (CRID) in a product’s filename. We can use the CRID to retain only the most recent version of each product in the list of URLs.
import re
pattern = re.compile(r"(NISAR_L2_PR_GCOV(?:_[^_]+){9})_(X\d{5})")
latest_version_dict = {}
for url in hdf5_urls:
m = pattern.search(url)
if not m:
continue
product, crid = m.groups()
if product not in latest_version_dict or crid > latest_version_dict[product][0]:
latest_version_dict[product] = (crid, url)
hdf5_urls = [i[1] for i in latest_version_dict.values()]
print(f"Retained {len(hdf5_urls)} GCOV products:")
hdf5_urlsRetained 5 GCOV products:
['https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05009_N_F_J_001/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05009_N_F_J_001.h5',
'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001.h5',
'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_007_172_A_008_2005_DHDH_A_20251216T024619_20251216T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_007_172_A_008_2005_DHDH_A_20251216T024619_20251216T024653_X05009_N_F_J_001.h5',
'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_008_172_A_008_2005_DHDH_A_20251228T024619_20251228T024654_X05009_N_F_J_001/NISAR_L2_PR_GCOV_008_172_A_008_2005_DHDH_A_20251228T024619_20251228T024654_X05009_N_F_J_001.h5',
'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_009_172_A_008_2005_DHDH_A_20260109T024620_20260109T024654_X05009_N_F_J_001/NISAR_L2_PR_GCOV_009_172_A_008_2005_DHDH_A_20260109T024620_20260109T024654_X05009_N_F_J_001.h5']2c. Provide your Earthdata Login (EDL) Bearer Token¶
Both HTTPS and S3 access require an EDL Bearer Token

View or generate a Bearer Token in “Generate Token” tab of the Profile page in your Earthdata Login account: https://
from getpass import getpass
token = getpass("Enter your EDL Bearer Token")Enter your EDL Bearer Token ········
3. Load a single GCOV product with HTTPS¶
This example loads the data in a Python with statement.
3a. Open a GCOV file in a with block and compute the mean of an HHHH spatial subset¶
Note that any computation on the data (in this case a subset mean) must be performed inside the with block.
%%time
import asf_search as asf
import fsspec
import rioxarray
import xarray as xr
fs = fsspec.filesystem(
"http",
headers = {"Authorization": f"Bearer {token}"},
block_size = 16 * 512 * 512,
)
with fs.open(hdf5_urls[0], "rb") as f:
dt = xr.open_datatree(
f,
engine="h5netcdf",
decode_timedelta=False,
phony_dims="access",
chunks="auto",
)
### Perform any calculations and save any computed results for future access here, inside the `with` block ###
frequencyA = dt["/science/LSAR/GCOV/grids/frequencyA"] # access the frequency A data
projection = frequencyA.projection.attrs['epsg_code'].item() # access the GCOV product's projection
hhhh = frequencyA.HHHH # access frequency A's HHHH band
hhhh = hhhh.rio.write_crs(projection) # write the project to the HHHH data for easy lat/lon subsetting
# subset the data
subset_hhhh = hhhh.rio.clip_box(
minx=40.8463, miny=13.2553,
maxx=40.8574, maxy=13.2684,
crs="EPSG:4326"
)
# save the mean of HHHH subset for use outside of the `with` block
subset_hhhh_mean = subset_hhhh.mean().to_numpy().item()
print(f'subset_hhhh_mean: {subset_hhhh_mean}\n')subset_hhhh_mean: 0.07500407099723816
CPU times: user 3.01 s, sys: 842 ms, total: 3.86 s
Wall time: 36.4 s
4. Load a single GCOV product from the S3 bucket¶
This example loads the data in a Python with statement.
4a. Convert the hdf5 URLs into S3 urls¶
from urllib.parse import urlparse
bucket = "sds-n-cumulus-prod-nisar-products"
s3_urls = [f"s3://{bucket}/{'/'.join(urlparse(url).path.split('/')[2:])}" for url in hdf5_urls]
s3_urls['s3://sds-n-cumulus-prod-nisar-products/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05009_N_F_J_001/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05009_N_F_J_001.h5',
's3://sds-n-cumulus-prod-nisar-products/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001.h5',
's3://sds-n-cumulus-prod-nisar-products/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_007_172_A_008_2005_DHDH_A_20251216T024619_20251216T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_007_172_A_008_2005_DHDH_A_20251216T024619_20251216T024653_X05009_N_F_J_001.h5',
's3://sds-n-cumulus-prod-nisar-products/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_008_172_A_008_2005_DHDH_A_20251228T024619_20251228T024654_X05009_N_F_J_001/NISAR_L2_PR_GCOV_008_172_A_008_2005_DHDH_A_20251228T024619_20251228T024654_X05009_N_F_J_001.h5',
's3://sds-n-cumulus-prod-nisar-products/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_009_172_A_008_2005_DHDH_A_20260109T024620_20260109T024654_X05009_N_F_J_001/NISAR_L2_PR_GCOV_009_172_A_008_2005_DHDH_A_20260109T024620_20260109T024654_X05009_N_F_J_001.h5']4b. Use your EDL Bearer Token to get S3 bucket access credentials¶
The S3 credentials acquired below expire after 1 hour.
import json
import s3fs
import urllib
prefix = "NISAR_L2_GCOV_BETA_V1"
event = {
"CredentialsEndpoint": "https://nisar.asf.earthdatacloud.nasa.gov/s3credentials",
"BearerToken": token,
"Bucket": bucket,
"Prefix": prefix,
}
# Get temporary download credentials
tea_url = event["CredentialsEndpoint"]
bearer_token = event["BearerToken"]
req = urllib.request.Request(
url=tea_url,
headers={"Authorization": f"Bearer {bearer_token}"}
)
with urllib.request.urlopen(req) as f:
creds = json.loads(f.read().decode())4c. Open a GCOV file in a with block and compute the mean of an HHHH spatial subset¶
%%time
import xarray as xr
import s3fs
import rioxarray
s3_url = s3_urls[0]
fs = s3fs.S3FileSystem(
key=creds["accessKeyId"],
secret=creds["secretAccessKey"],
token=creds["sessionToken"],
)
kwargs = {
"cache_type": "background",
"block_size": 16 * 512 * 512, # 16 MB
}
with fs.open(s3_url, "rb", **kwargs) as f:
dt = xr.open_datatree(
f,
engine="h5netcdf",
decode_timedelta=False,
chunks="auto",
phony_dims="access"
)
### Perform any calculations and save any computed results for future access here, inside the `with` block ###
frequencyA = dt["/science/LSAR/GCOV/grids/frequencyA"] # access the frequency A data
projection = frequencyA.projection.attrs['epsg_code'].item() # access the GCOV product's projection
hhhh = frequencyA.HHHH # access frequency A's HHHH band
hhhh = hhhh.rio.write_crs(projection) # write the project to the HHHH data for easy lat/lon subsetting
# subset the data
subset_hhhh = hhhh.rio.clip_box(
minx=40.8463, miny=13.2553,
maxx=40.8574, maxy=13.2684,
crs="EPSG:4326"
)
# save the mean of HHHH subset for use outside of the `with` block
subset_hhhh_mean = subset_hhhh.mean().to_numpy().item()
print(f'subset_hhhh_mean: {subset_hhhh_mean}\n')subset_hhhh_mean: 0.07500407099723816
CPU times: user 1.05 s, sys: 174 ms, total: 1.23 s
Wall time: 1.92 s
4d. View the DataTree¶
You can look at the DataTree outside the with block, but it only contains pointers to the data and the file is closed, so you can no longer access any of the data values to which it points
dt4e. Try (and fail) to compute a value in the DataTree¶
The file was closed upon exiting the with block so trying to compute a value from the DataTree will fail with a ValueError: I/O operation on closed file.
dt.compute()However, subset_hhhh_mean was computed and saved to memory inside the with block, so we can still see its value¶
subset_hhhh_mean0.075004070997238165. Load multiple GCOV Products at once¶
5a. Iterate through a list of HDF5 S3 bucket URLs, and open the /science/LSAR/GCOV/grids/frequencyA for each¶
This leaves the files open for later use, which means you should manually close them when finished in order to prevent memory leaks.
%%time
import xarray as xr
import rioxarray
# Explore the DataTree rendering above in Step 4 for a complete list of available groups
group_path = "/science/LSAR/GCOV/grids/frequencyA" # change this to any GCOV HDF5 group you wish
kwargs = {
"cache_type": "background",
"block_size": 16 * 1024 * 1024, # 16 MB
}
files = [fs.open(url, "rb", **kwargs) for url in s3_urls]
datatrees = [
xr.open_datatree(
f,
engine="h5netcdf",
decode_timedelta=False,
phony_dims="access",
chunks="auto",
group=group_path,
)
for f in files
]CPU times: user 1.05 s, sys: 485 ms, total: 1.54 s
Wall time: 5.53 s
5b. Since the files remain open, we can access still access them and perform computations on their contents¶
Iterate through the DataTrees, calculating subset HHHH mean values
for tree in datatrees:
projection = tree.projection.attrs['epsg_code'].item()
hhhh = tree.HHHH
hhhh = hhhh.rio.write_crs(projection)
subset_hhhh = hhhh.rio.clip_box(
minx=40.8463, miny=13.2553,
maxx=40.8574, maxy=13.2684,
crs="EPSG:4326"
)
subset_hhhh_mean = subset_hhhh.mean().to_numpy().item()
print(subset_hhhh_mean)0.07500407099723816
0.07547999173402786
0.0776161402463913
0.07449550926685333
0.07419200986623764
6. Stream a time series¶
Use the multiple files we opened in Step 5 to create a GCOV time series
6a. Define a function to extract datetimes for a time dimension¶
import re
from urllib.parse import urlparse
from datetime import datetime
from pathlib import PurePosixPath
NISAR_TS_RE = re.compile(r"_(\d{8}T\d{6})_")
def nisar_start_time_from_url(s3_url: str) -> datetime:
path = urlparse(s3_url).path
name = PurePosixPath(path).name
m = NISAR_TS_RE.search(name)
if not m:
raise ValueError(f"No NISAR timestamp found in: {s3_url}")
return datetime.strptime(m.group(1), "%Y%m%dT%H%M%S")6b. Create a list of datetimes for a time dimension¶
dts = [nisar_start_time_from_url(url) for url in s3_urls]
dts[datetime.datetime(2025, 11, 22, 2, 46, 18),
datetime.datetime(2025, 12, 4, 2, 46, 18),
datetime.datetime(2025, 12, 16, 2, 46, 19),
datetime.datetime(2025, 12, 28, 2, 46, 19),
datetime.datetime(2026, 1, 9, 2, 46, 20)]6c. Create an xarray.DataArray containing a time dimension for each open GCOV group¶
dataarrays = [
tree.ds.assign_coords(time=dt).expand_dims(time=1)
for dt, tree in zip(dts, datatrees)
]
dataarrays[0].timefor da in dataarrays:
print(da.dims)FrozenMappingWarningOnValuesAccess({'time': 1, 'yCoordinates': 16704, 'xCoordinates': 17064, 'phony_dim_0': 2})
FrozenMappingWarningOnValuesAccess({'time': 1, 'yCoordinates': 16704, 'xCoordinates': 17064, 'phony_dim_0': 2})
FrozenMappingWarningOnValuesAccess({'time': 1, 'yCoordinates': 16704, 'xCoordinates': 17064, 'phony_dim_0': 2})
FrozenMappingWarningOnValuesAccess({'time': 1, 'yCoordinates': 16704, 'xCoordinates': 17064, 'phony_dim_0': 2})
FrozenMappingWarningOnValuesAccess({'time': 1, 'yCoordinates': 16704, 'xCoordinates': 17064, 'phony_dim_0': 2})
6d. Concatenate the xarray.DataArrays into a single xarray.Dataset time series¶
ts = xr.concat(dataarrays, dim="time")
tsfor f in files:
f.close()7b. With closed files, we can no longer compute values in the xarray data structures that we have created¶
The cell below will return a ValueError: I/O operation on closed file.
ts.compute()8. Summary¶
You now have the tools and knowledge that you need to search with asf_search, generate temporary S3 bucket credentials from an Earthdata Login Bearer Token, stream data from S3 with s3fs, and load them into xarray data structures.