Opening NISAR GCOV HDF5 Files in Python

This notebook demonstrates how to open and access NISAR GCOV HDF5 files using h5py (Python), focusing on navigating GCOV files to locate and extract data of interest.

This notebook is intended for users with basic Python experience who want to work directly with NISAR GCOV products, including scientists, students, and practitioners transitioning from GeoTIFF-based SAR products to HDF5-based formats.

Overview¶

Prerequisites
Download a NISAR GCOV HDF5 file
Accessing the science group
Accessing the LSAR group
Accessing the GCOV group
Navigating into the grids group
Inspecting datasets within a frequency group
Accessing a specific GCOV dataset
Reading the dataset into memory
Build a raster transform and export to GeoTIFF
Summary
Resources and References

1. Prerequisites¶

Prerequisite	Importance	Notes
The software environment for this cookbook must be installed	Necessary

Rough Notebook Time Estimate: 10 minutes

2. Download a NISAR GCOV HDF5 file¶

Download a GCOV HDF5 file to your home directory and view the top-level HDF5 group, science.

2a. Download the HDF5 file¶

import os
import asf_search as asf
from datetime import datetime
from getpass import getpass
from pathlib import Path

#1: Search for and download NISAR GCOV HDF5 products from ASF

#authenticate with Earthdata Login
#create an ASF session object (stores your login + settings for requests)
session = asf.ASFSession()
#prompt for username/password
session.auth_with_creds(input('EDL Username'), getpass('EDL Password'))

#2: Define the search window and area of interest 

#insert start and end dates. Note that dates and times are inclusive. 
start_date = datetime(2025, 12, 4)
end_date = datetime(2025, 12, 5)
# POINT or POLYGON as WKT (well-known-text) using lat/lon pairs 
area_of_interest = "POLYGON((40.9131 12.3904,41.8891 12.3904,41.8891 13.2454,40.9131 13.2454,40.9131 12.3904))" 

#3: Build search options 

#these filters narrow results to NISAR GCOV products over the AOI and time window
opts = asf.ASFSearchOptions(
    **{
        "maxResults": 250,                    # cap the number of returned results
        "intersectsWith": area_of_interest,   # spatial filter (WKT geometry)
        "start": start_date,                  # temporal filter (start)
        "end": end_date,                      # temporal filter (end)
        "processingLevel": ["GCOV"],          # product type / processing level
        "dataset": ["NISAR"],                 # mission/dataset name
        "productionConfiguration": ["PR"],    # production config (e.g., provisional)
        "session": session,                   # use authenticated session from above
    }
)

#4: Run the search for HDF5 files
#execute search
response = asf.search(opts=opts)

#exclude QA_STATE files 
pattern = r'^(?!.*QA_STATS).*'

#extract files that end with .h5 
hdf5_files = response.find_urls(extension='.h5', pattern=pattern, directAccess=False)
hdf5_files

EDL Username jwhite124
EDL Password ········

['https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05007_N_F_J_001/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05007_N_F_J_001.h5',
 'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001.h5']

2b. Retain only the URL for the most recent version of each product in the search results¶

Data is occasionally re-released with an updated version. Versions are recorded as a Composite Release Identifier (CRID) in a product’s filename. We can use the CRID to retain only the most recent version of each product in the list of URLs.

import re

pattern = re.compile(r"(NISAR_L2_PR_GCOV(?:_[^_]+){9})_(X\d{5})")

latest_version_dict = {}

for url in hdf5_files:
    m = pattern.search(url)
    if not m:
        continue

    product, crid = m.groups()

    if product not in latest_version_dict or crid > latest_version_dict[product][0]:
        latest_version_dict[product] = (crid, url)

hdf5_files = [i[1] for i in latest_version_dict.values()]
print(f"Retained {len(hdf5_files)} GCOV products:")
hdf5_files

Retained 1 GCOV products:

['https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001.h5']

2c. Download the data¶

# Create a folder and download the files
data_dir = Path.home() / "GCOV_data"
data_dir.mkdir(exist_ok=True)
print(f'data_dir: {data_dir}')

#download files
asf.download_urls(hdf5_files, data_dir, session=session, processes=4)

data_dir: /home/jovyan/GCOV_data

2d. Open the file and list the top-level HDF5 group¶

import h5py

#path to the NISAR GCOV HDF5 file
h5_path = list(data_dir.glob("NISAR_L2_*_GCOV*.h5"))[0]

#open the GCOV HDF5 file in read mode ("r")
#HDF5 files behave like Python dictionaries, where keys correspond to groups
f = h5py.File(h5_path, "r")

#inspect the top-level group names in the file
list(f)

['science']

3. Accessing the science and LSAR groups¶

All NISAR science products organize their primary data and metadata under the /science group. After opening the file, the first step in navigating a GCOV product is to inspect this group to determine which SAR instruments (LSAR only, or both SSAR and LSAR) are included in the file. This indicates available paths within the GCOV data.

In this notebook, we focus exclusively on GCOV data produced by the L-band SAR (LSAR) instrument.

list(f["science"])

['LSAR']

4. Accessing the LSAR group¶

Within the /science/LSAR/ group, different product types and supporting information are organized into separate subgroups. For GCOV products, all gridded covariance data are stored under the /science/LSAR/GCOV/ path.

The next step is to inspect the contents of the /science/LSAR/ group to confirm the presence of the GCOV subgroup.

#list the contents of the /science/LSAR group
#this shows the available product types and supporting subgroups
list(f["science/LSAR"])

['GCOV', 'identification']

5. Accessing the GCOV group¶

After confirming that the file contains GCOV data, the next step is to examine how the GCOV product is organized internally. The GCOV group serves as the top-level container for all gridded covariance data and related metadata within the LSAR science product.

Listing the contents of the GCOV group allows us to identify where the gridded measurement datasets are stored and which subgroups contain supporting information.

#list the contents of the GCOV group
#this reveals how GCOV data are organized internally
list(f["science/LSAR/GCOV"])

['grids', 'metadata']

6. Navigating into the grids group¶

The contents of the grids group inform which frequency groups are present in the GCOV product. Depending on the acquisition mode, a GCOV file may contain data from one or two frequency groups.

#list the contents of the grids group
#this shows how GCOV raster data are organized by frequency
list(f["science/LSAR/GCOV/grids"])

['frequencyA', 'frequencyB']

7. Inspecting datasets within a frequency group¶

The next step is to list the datasets available for a specific frequency group. In this example, we inspect frequencyA.

#list the datasets available within the frequencyA group
#these correspond to specific polarization combinations and supporting layers
list(f["science/LSAR/GCOV/grids/frequencyA"])

['HHHH',
 'HVHV',
 'listOfCovarianceTerms',
 'listOfPolarizations',
 'mask',
 'numberOfLooks',
 'numberOfSubSwaths',
 'projection',
 'rtcGammaToSigmaFactor',
 'xCoordinateSpacing',
 'xCoordinates',
 'yCoordinateSpacing',
 'yCoordinates']

8. Accessing a specific GCOV dataset¶

Once the available datasets within a frequency group have been identified, the next step is to access an individual dataset directly. Before loading the full raster into memory, it is useful to inspect basic properties of the dataset, such as its corresponding shape and data type, to confirm that it contains the expected data.

In this example, we access the HVHV dataset from the frequencyA group.

#access a covariance (measurement) dataset by its full HDF5 path
#HVHV is a real-valued covariance term for the HV–HV polarization combination
hvhv = f["science/LSAR/GCOV/grids/frequencyA/HVHV"]

#inspect basic properties of the dataset without loading the full array
hvhv.shape, hvhv.dtype

((16704, 17064), dtype('<f4'))

8a. Inspecting dataset attributes¶

Each GCOV dataset includes a set of attributes that describe how the data values should be interpreted. These attributes provide essential context such as units, valid value ranges, and how missing or invalid data are represented.

Before reading the dataset into memory or using it in analysis, we inspect these attributes to understand what the values might represent.

#list all attributes attached to the HVHV dataset
list(hvhv.attrs)

['DIMENSION_LIST',
 '_FillValue',
 'description',
 'grid_mapping',
 'long_name',
 'max_value',
 'mean_value',
 'min_value',
 'sample_stddev',
 'units',
 'valid_min']

#select key dataset attributes
#before reading the dataset into memory, we inspect a small number of attributes that directly affect how the data should be handled in analysis and visualization

print("Fill value:", hvhv.attrs.get("_FillValue"))
print("Minimum valid value:", hvhv.attrs.get("valid_min"))

Fill value: nan
Minimum valid value: 0.0

9. Reading the dataset into memory¶

Up to this point, we have been working with an HDF5 dataset object, which provides access to metadata without loading the data values themselves. To perform calculations, visualization, or raster export, the dataset values must be explicitly read into memory.

In this step, we load the HVHV dataset into a NumPy array.

#read the full HVHV dataset into memory as a NumPy array
hvhv_data = hvhv[()]

#confirm the array shape
hvhv_data.shape

(16704, 17064)

9a. Reading the grid coordinate vectors¶

GCOV datasets are stored as gridded arrays, and the file provides coordinate datasets that define the spatial location of each pixel. The xCoordinates and yCoordinates datasets provide the coordinate vectors for the grid, which are used to geolocate the raster.

#read the x and y coordinate vectors for the frequencyA grid
x = f["science/LSAR/GCOV/grids/frequencyA/xCoordinates"][:]
y = f["science/LSAR/GCOV/grids/frequencyA/yCoordinates"][:]

#inspect coordinate ranges and lengths
print(f"x range: {x.min():.1f} to {x.max():.1f}")
print(f"y range: {y.min():.1f} to {y.max():.1f}")
print(f"number of x coordinates (columns): {len(x)}")
print(f"number of y coordinates (rows): {len(y)}")

x range: 604090.0 to 945350.0
y range: 1254250.0 to 1588310.0
number of x coordinates (columns): 17064
number of y coordinates (rows): 16704

9b. Reading grid spacing¶

The xCoordinates and yCoordinates vectors define the grid location, but we also need the spacing between pixels. GCOV products provide xCoordinateSpacing and yCoordinateSpacing, which describe the pixel size in the projected coordinate system. These values are useful for confirming resolution and for constructing a georeferencing transform during raster export.

#read the grid spacing (pixel size) for the frequencyA grid
dx = f["science/LSAR/GCOV/grids/frequencyA/xCoordinateSpacing"][()]
dy = f["science/LSAR/GCOV/grids/frequencyA/yCoordinateSpacing"][()]

#print
print(f"x spacing (dx): {float(dx)}")
print(f"y spacing (dy): {float(dy)}")

x spacing (dx): 20.0
y spacing (dy): -20.0

9c. Reading projection information¶

The GCOV grid is defined in a projected coordinate system. To correctly interpret the coordinate values and export a georeferenced raster, we need to identify the projection used by the grid.

GCOV products store projection information within each frequency group.

#read the projection information for the frequencyA grid
projection = f["science/LSAR/GCOV/grids/frequencyA/projection"][()]

projection

np.uint32(32637)

10. Build a raster transform and export to GeoTIFF¶

Now we combine:

coordinate vectors (x, y)
pixel spacing (dx, dy)
CRS (EPSG:32637)

to export a properly georeferenced raster.

import numpy as np
import rasterio
from rasterio.transform import from_origin

#convert the projection value into a CRS string for rasterio 
crs = f"EPSG:{int(projection)}"

start_x = float(x[0]) - float(dx) / 2
start_y = float(y[0]) - float(dy) / 2

transform = from_origin(
    start_x,
    start_y,
    float(dx),
    float(abs(dy))
)

#name of the output GeoTIFF file
out_tif = data_dir / "GCOV_frequencyA_HVHV.tif"

#open a new GeoTIFF file 
with rasterio.open(
    out_tif,
    "w",                      #write mode
    driver="GTiff",           #output file format
    height=hvhv_data.shape[0],#number of rows
    width=hvhv_data.shape[1], #number of columns
    count=1,                  #single-band raster
    dtype=hvhv_data.dtype,    #data type matches the NumPy array
    crs=crs,                  #coordinate reference system
    transform=transform,      #georeferencing information
    nodata=np.nan,            #missing values stored as NaNs
) as dst:
    #write the HVHV data array to band 1 of the GeoTIFF
    dst.write(hvhv_data, 1)

#print the output filename for confirmation of file creation
out_tif

PosixPath('/home/jovyan/GCOV_data/GCOV_frequencyA_HVHV.tif')

#quality check
with rasterio.open(out_tif) as src:
    print("CRS:", src.crs)
    print("Resolution:", src.res)
    print("Bounds:", src.bounds)

CRS: EPSG:32637
Resolution: (20.0, 20.0)
Bounds: BoundingBox(left=604080.0, bottom=1254240.0, right=945360.0, top=1588320.0)

11. Summary¶

You have now opened and navigated an HDF5 file, saved it to memory, and exported a layer as a GeoTIFF.

12. Resources and references¶

Authors: Julia White, Alex Lewandowski