Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Search and Stream NISAR GCOV Data with asf_search


asf_search is an open source Python package for searching SAR data archived at ASF. This notebook demonstrates how to search NISAR GCOV with asf_search, stream them directly over HTTPS or from S3 storage with s3fs, and load them into xarray data structures.

NISAR data products can be very large. It may be helpful to access the data directly from their S3 storage. This allows you to lazily load data into xarray data structures, subset, and perform operations on them with xarray, and then only save the data you need to memory. This avoids downloading many large products to a storage volume, only to subset and delete most of them. The caveat is that you must have enough RAM to hold your final subset and also run your workflow.


Overview

  1. Prerequisites

  2. Search for data

  3. Load a single GCOV product with HTTPS

  4. Load a single GCOV product from the S3 bucket

  5. Load multiple GCOV Products at once

  6. Stream a time series

  7. Close open files

  8. Summary

  9. Resources and references


1. Prerequisites

PrerequisiteImportanceNotes
The software environment for this cookbook must be installedNecessary
  • Rough Notebook Time Estimate: 10 minutes


Use asf_search to find GCOV data.

2a. Perform an asf_search.search() to identify your desired product URLs

import os
import asf_search as asf
from datetime import datetime
from getpass import getpass

import warnings
warnings.filterwarnings(
    "ignore",
    message="Parsing dates involving a day of month without a year specified",
)

session = asf.ASFSession()

start_date = datetime(2025, 11, 22)
end_date = datetime(2026, 3, 5)
area_of_interest = "POLYGON((40.9131 12.3904,41.8891 12.3904,41.8891 13.2454,40.9131 13.2454,40.9131 12.3904))" # POINT or POLYGON as WKT (well-known-text)
pattern = r'^(?!.*QA_STATS).*'

opts=asf.ASFSearchOptions(**{
    "maxResults": 250,
    "intersectsWith": area_of_interest,
    "flightDirection": "ASCENDING",
    "start": start_date,
    "end": end_date,
    "processingLevel": [
        "GCOV"
    ],
    "dataset": [
        "NISAR"
    ],
    "productionConfiguration": [
        "PR"
    ],
    'session': session,
})

response = asf.search(opts=opts)
hdf5_urls = response.find_urls(extension='.h5', pattern=pattern, directAccess=False)
print(f"Found {len(hdf5_urls)} GCOV products:")
hdf5_urls
Found 7 GCOV products:
['https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05007_N_F_J_001/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05007_N_F_J_001.h5', 'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05009_N_F_J_001/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05009_N_F_J_001.h5', 'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05007_N_F_J_001/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05007_N_F_J_001.h5', 'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001.h5', 'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_007_172_A_008_2005_DHDH_A_20251216T024619_20251216T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_007_172_A_008_2005_DHDH_A_20251216T024619_20251216T024653_X05009_N_F_J_001.h5', 'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_008_172_A_008_2005_DHDH_A_20251228T024619_20251228T024654_X05009_N_F_J_001/NISAR_L2_PR_GCOV_008_172_A_008_2005_DHDH_A_20251228T024619_20251228T024654_X05009_N_F_J_001.h5', 'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_009_172_A_008_2005_DHDH_A_20260109T024620_20260109T024654_X05009_N_F_J_001/NISAR_L2_PR_GCOV_009_172_A_008_2005_DHDH_A_20260109T024620_20260109T024654_X05009_N_F_J_001.h5']

2b. Retain only the URL for the most recent version of each product in the search results

Data is occasionally re-released with an updated version. Versions are recorded as a Composite Release Identifier (CRID) in a product’s filename. We can use the CRID to retain only the most recent version of each product in the list of URLs.

import re

pattern = re.compile(r"(NISAR_L2_PR_GCOV(?:_[^_]+){9})_(X\d{5})")

latest_version_dict = {}

for url in hdf5_urls:
    m = pattern.search(url)
    if not m:
        continue

    product, crid = m.groups()

    if product not in latest_version_dict or crid > latest_version_dict[product][0]:
        latest_version_dict[product] = (crid, url)

hdf5_urls = [i[1] for i in latest_version_dict.values()]
print(f"Retained {len(hdf5_urls)} GCOV products:")
hdf5_urls
Retained 5 GCOV products:
['https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05009_N_F_J_001/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05009_N_F_J_001.h5', 'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001.h5', 'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_007_172_A_008_2005_DHDH_A_20251216T024619_20251216T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_007_172_A_008_2005_DHDH_A_20251216T024619_20251216T024653_X05009_N_F_J_001.h5', 'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_008_172_A_008_2005_DHDH_A_20251228T024619_20251228T024654_X05009_N_F_J_001/NISAR_L2_PR_GCOV_008_172_A_008_2005_DHDH_A_20251228T024619_20251228T024654_X05009_N_F_J_001.h5', 'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_009_172_A_008_2005_DHDH_A_20260109T024620_20260109T024654_X05009_N_F_J_001/NISAR_L2_PR_GCOV_009_172_A_008_2005_DHDH_A_20260109T024620_20260109T024654_X05009_N_F_J_001.h5']

2c. Provide your Earthdata Login (EDL) Bearer Token

Both HTTPS and S3 access require an EDL Bearer Token

Image of the Earthdata Login Generate Token web page

View or generate a Bearer Token in “Generate Token” tab of the Profile page in your Earthdata Login account: https://urs.earthdata.nasa.gov/profile

from getpass import getpass

token = getpass("Enter your EDL Bearer Token")
Enter your EDL Bearer Token ········

3. Load a single GCOV product with HTTPS

This example loads the data in a Python with statement.

3a. Open a GCOV file in a with block and compute the mean of an HHHH spatial subset

Note that any computation on the data (in this case a subset mean) must be performed inside the with block.

%%time
import asf_search as asf
import fsspec
import rioxarray
import xarray as xr

fs = fsspec.filesystem(
    "http",
    headers = {"Authorization": f"Bearer {token}"},
    block_size = 16 * 512 * 512,
)

with fs.open(hdf5_urls[0], "rb") as f:
    dt = xr.open_datatree(
        f,
        engine="h5netcdf",
        decode_timedelta=False,
        phony_dims="access",
        chunks="auto",
    )

    ### Perform any calculations and save any computed results for future access here, inside the `with` block ###
    frequencyA = dt["/science/LSAR/GCOV/grids/frequencyA"] # access the frequency A data
    projection = frequencyA.projection.attrs['epsg_code'].item() # access the GCOV product's projection
    hhhh = frequencyA.HHHH # access frequency A's HHHH band
    hhhh = hhhh.rio.write_crs(projection) # write the project to the HHHH data for easy lat/lon subsetting

    # subset the data
    subset_hhhh = hhhh.rio.clip_box(
        minx=40.8463, miny=13.2553,
        maxx=40.8574, maxy=13.2684, 
        crs="EPSG:4326"
    )

    # save the mean of HHHH subset for use outside of the `with` block
    subset_hhhh_mean = subset_hhhh.mean().to_numpy().item()
    print(f'subset_hhhh_mean: {subset_hhhh_mean}\n')
subset_hhhh_mean: 0.07500407099723816

CPU times: user 3.01 s, sys: 842 ms, total: 3.86 s
Wall time: 36.4 s

4. Load a single GCOV product from the S3 bucket

This example loads the data in a Python with statement.

4a. Convert the hdf5 URLs into S3 urls

from urllib.parse import urlparse

bucket = "sds-n-cumulus-prod-nisar-products"

s3_urls = [f"s3://{bucket}/{'/'.join(urlparse(url).path.split('/')[2:])}" for url in hdf5_urls]
s3_urls
['s3://sds-n-cumulus-prod-nisar-products/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05009_N_F_J_001/NISAR_L2_PR_GCOV_005_172_A_008_2005_DHDH_A_20251122T024618_20251122T024652_X05009_N_F_J_001.h5', 's3://sds-n-cumulus-prod-nisar-products/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001.h5', 's3://sds-n-cumulus-prod-nisar-products/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_007_172_A_008_2005_DHDH_A_20251216T024619_20251216T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_007_172_A_008_2005_DHDH_A_20251216T024619_20251216T024653_X05009_N_F_J_001.h5', 's3://sds-n-cumulus-prod-nisar-products/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_008_172_A_008_2005_DHDH_A_20251228T024619_20251228T024654_X05009_N_F_J_001/NISAR_L2_PR_GCOV_008_172_A_008_2005_DHDH_A_20251228T024619_20251228T024654_X05009_N_F_J_001.h5', 's3://sds-n-cumulus-prod-nisar-products/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_009_172_A_008_2005_DHDH_A_20260109T024620_20260109T024654_X05009_N_F_J_001/NISAR_L2_PR_GCOV_009_172_A_008_2005_DHDH_A_20260109T024620_20260109T024654_X05009_N_F_J_001.h5']

4b. Use your EDL Bearer Token to get S3 bucket access credentials

The S3 credentials acquired below expire after 1 hour.

import json
import s3fs
import urllib

prefix = "NISAR_L2_GCOV_BETA_V1"

event = {
    "CredentialsEndpoint": "https://nisar.asf.earthdatacloud.nasa.gov/s3credentials",
    "BearerToken": token,
    "Bucket": bucket,
    "Prefix": prefix,
}

# Get temporary download credentials
tea_url = event["CredentialsEndpoint"]
bearer_token = event["BearerToken"]
req = urllib.request.Request(
    url=tea_url,
    headers={"Authorization": f"Bearer {bearer_token}"}
)
with urllib.request.urlopen(req) as f:
    creds = json.loads(f.read().decode())

4c. Open a GCOV file in a with block and compute the mean of an HHHH spatial subset

%%time
import xarray as xr
import s3fs
import rioxarray

s3_url = s3_urls[0]

fs = s3fs.S3FileSystem(
    key=creds["accessKeyId"],
    secret=creds["secretAccessKey"],
    token=creds["sessionToken"],
)

kwargs = {
    "cache_type": "background",
    "block_size": 16 * 512 * 512,  # 16 MB
}

with fs.open(s3_url, "rb", **kwargs) as f:
    dt = xr.open_datatree(
        f, 
        engine="h5netcdf", 
        decode_timedelta=False, 
        chunks="auto",
        phony_dims="access"
    )

    ### Perform any calculations and save any computed results for future access here, inside the `with` block ###
    frequencyA = dt["/science/LSAR/GCOV/grids/frequencyA"] # access the frequency A data
    projection = frequencyA.projection.attrs['epsg_code'].item() # access the GCOV product's projection
    hhhh = frequencyA.HHHH # access frequency A's HHHH band
    hhhh = hhhh.rio.write_crs(projection) # write the project to the HHHH data for easy lat/lon subsetting

    # subset the data
    subset_hhhh = hhhh.rio.clip_box(
        minx=40.8463, miny=13.2553,
        maxx=40.8574, maxy=13.2684, 
        crs="EPSG:4326"
    )

    # save the mean of HHHH subset for use outside of the `with` block
    subset_hhhh_mean = subset_hhhh.mean().to_numpy().item()
    print(f'subset_hhhh_mean: {subset_hhhh_mean}\n')
subset_hhhh_mean: 0.07500407099723816

CPU times: user 1.05 s, sys: 174 ms, total: 1.23 s
Wall time: 1.92 s

4d. View the DataTree

You can look at the DataTree outside the with block, but it only contains pointers to the data and the file is closed, so you can no longer access any of the data values to which it points

dt
Loading...

4e. Try (and fail) to compute a value in the DataTree

The file was closed upon exiting the with block so trying to compute a value from the DataTree will fail with a ValueError: I/O operation on closed file.

dt.compute()

However, subset_hhhh_mean was computed and saved to memory inside the with block, so we can still see its value

subset_hhhh_mean
0.07500407099723816

5. Load multiple GCOV Products at once

5a. Iterate through a list of HDF5 S3 bucket URLs, and open the /science/LSAR/GCOV/grids/frequencyA for each

This leaves the files open for later use, which means you should manually close them when finished in order to prevent memory leaks.

%%time
import xarray as xr
import rioxarray

# Explore the DataTree rendering above in Step 4 for a complete list of available groups 
group_path = "/science/LSAR/GCOV/grids/frequencyA" # change this to any GCOV HDF5 group you wish

kwargs = {
    "cache_type": "background",
    "block_size": 16 * 1024 * 1024,  # 16 MB
}

files = [fs.open(url, "rb", **kwargs) for url in s3_urls]

datatrees = [
    xr.open_datatree(
        f,
        engine="h5netcdf",
        decode_timedelta=False,
        phony_dims="access",
        chunks="auto",
        group=group_path,
    )
    for f in files
]
CPU times: user 1.05 s, sys: 485 ms, total: 1.54 s
Wall time: 5.53 s

5b. Since the files remain open, we can access still access them and perform computations on their contents

Iterate through the DataTrees, calculating subset HHHH mean values

for tree in datatrees:
    projection = tree.projection.attrs['epsg_code'].item()
    hhhh = tree.HHHH
    hhhh = hhhh.rio.write_crs(projection)
    
    subset_hhhh = hhhh.rio.clip_box(
        minx=40.8463, miny=13.2553,
        maxx=40.8574, maxy=13.2684, 
        crs="EPSG:4326"
    )
    subset_hhhh_mean = subset_hhhh.mean().to_numpy().item()
    print(subset_hhhh_mean)
0.07500407099723816
0.07547999173402786
0.0776161402463913
0.07449550926685333
0.07419200986623764

6. Stream a time series

Use the multiple files we opened in Step 5 to create a GCOV time series

6a. Define a function to extract datetimes for a time dimension

import re
from urllib.parse import urlparse
from datetime import datetime
from pathlib import PurePosixPath

NISAR_TS_RE = re.compile(r"_(\d{8}T\d{6})_")

def nisar_start_time_from_url(s3_url: str) -> datetime:
    path = urlparse(s3_url).path
    name = PurePosixPath(path).name
    
    m = NISAR_TS_RE.search(name)
    if not m:
        raise ValueError(f"No NISAR timestamp found in: {s3_url}")
    
    return datetime.strptime(m.group(1), "%Y%m%dT%H%M%S")

6b. Create a list of datetimes for a time dimension

dts = [nisar_start_time_from_url(url) for url in s3_urls]
dts
[datetime.datetime(2025, 11, 22, 2, 46, 18), datetime.datetime(2025, 12, 4, 2, 46, 18), datetime.datetime(2025, 12, 16, 2, 46, 19), datetime.datetime(2025, 12, 28, 2, 46, 19), datetime.datetime(2026, 1, 9, 2, 46, 20)]

6c. Create an xarray.DataArray containing a time dimension for each open GCOV group

dataarrays = [
    tree.ds.assign_coords(time=dt).expand_dims(time=1)
    for dt, tree in zip(dts, datatrees)
]

dataarrays[0].time
Loading...
for da in dataarrays:
    print(da.dims)
FrozenMappingWarningOnValuesAccess({'time': 1, 'yCoordinates': 16704, 'xCoordinates': 17064, 'phony_dim_0': 2})
FrozenMappingWarningOnValuesAccess({'time': 1, 'yCoordinates': 16704, 'xCoordinates': 17064, 'phony_dim_0': 2})
FrozenMappingWarningOnValuesAccess({'time': 1, 'yCoordinates': 16704, 'xCoordinates': 17064, 'phony_dim_0': 2})
FrozenMappingWarningOnValuesAccess({'time': 1, 'yCoordinates': 16704, 'xCoordinates': 17064, 'phony_dim_0': 2})
FrozenMappingWarningOnValuesAccess({'time': 1, 'yCoordinates': 16704, 'xCoordinates': 17064, 'phony_dim_0': 2})

6d. Concatenate the xarray.DataArrays into a single xarray.Dataset time series

ts = xr.concat(dataarrays, dim="time")
ts
Loading...

7. Close open files

Close all open files to prevent memory leaks.

7a. Close the files

for f in files:
    f.close()

7b. With closed files, we can no longer compute values in the xarray data structures that we have created

The cell below will return a ValueError: I/O operation on closed file.

ts.compute()

8. Summary

You now have the tools and knowledge that you need to search with asf_search, generate temporary S3 bucket credentials from an Earthdata Login Bearer Token, stream data from S3 with s3fs, and load them into xarray data structures.


9. Resources and references

Author: Alex Lewandowski