This notebook demonstrates how to open and access NISAR GCOV HDF5 files using h5py (Python), focusing on navigating GCOV files to locate and extract data of interest.
This notebook is intended for users with basic Python experience who want to work directly with NISAR GCOV products, including scientists, students, and practitioners transitioning from GeoTIFF-based SAR products to HDF5-based formats.
Overview¶
1. Prerequisites¶
| Prerequisite | Importance | Notes |
|---|---|---|
| The software environment for this cookbook must be installed | Necessary |
Rough Notebook Time Estimate: 10 minutes
2. Download a NISAR GCOV HDF5 file¶
Download a GCOV HDF5 file to your home directory and view the top-level HDF5 group, science.
2a. Download the HDF5 file¶
import os
import asf_search as asf
from datetime import datetime
from getpass import getpass
from pathlib import Path
#1: Search for and download NISAR GCOV HDF5 products from ASF
#authenticate with Earthdata Login
#create an ASF session object (stores your login + settings for requests)
session = asf.ASFSession()
#prompt for username/password
session.auth_with_creds(input('EDL Username'), getpass('EDL Password'))
#2: Define the search window and area of interest
#insert start and end dates. Note that dates and times are inclusive.
start_date = datetime(2025, 12, 4)
end_date = datetime(2025, 12, 5)
# POINT or POLYGON as WKT (well-known-text) using lat/lon pairs
area_of_interest = "POLYGON((40.9131 12.3904,41.8891 12.3904,41.8891 13.2454,40.9131 13.2454,40.9131 12.3904))"
#3: Build search options
#these filters narrow results to NISAR GCOV products over the AOI and time window
opts = asf.ASFSearchOptions(
**{
"maxResults": 250, # cap the number of returned results
"intersectsWith": area_of_interest, # spatial filter (WKT geometry)
"start": start_date, # temporal filter (start)
"end": end_date, # temporal filter (end)
"processingLevel": ["GCOV"], # product type / processing level
"dataset": ["NISAR"], # mission/dataset name
"productionConfiguration": ["PR"], # production config (e.g., provisional)
"session": session, # use authenticated session from above
}
)
#4: Run the search for HDF5 files
#execute search
response = asf.search(opts=opts)
#exclude QA_STATE files
pattern = r'^(?!.*QA_STATS).*'
#extract files that end with .h5
hdf5_files = response.find_urls(extension='.h5', pattern=pattern, directAccess=False)
hdf5_filesEDL Username aflewandowski
EDL Password ········
['https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05007_N_F_J_001/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05007_N_F_J_001.h5',
'https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001.h5']2b. Retain only the URL for the most recent version of each product in the search results¶
Data is occasionally re-released with an updated version. Versions are recorded as a Composite Release Identifier (CRID) in a product’s filename. We can use the CRID to retain only the most recent version of each product in the list of URLs.
import re
pattern = re.compile(r"(NISAR_L2_PR_GCOV(?:_[^_]+){9})_(X\d{5})")
latest_version_dict = {}
for url in hdf5_files:
m = pattern.search(url)
if not m:
continue
product, crid = m.groups()
if product not in latest_version_dict or crid > latest_version_dict[product][0]:
latest_version_dict[product] = (crid, url)
hdf5_files = [i[1] for i in latest_version_dict.values()]
print(f"Retained {len(hdf5_files)} GCOV products:")
hdf5_filesRetained 1 GCOV products:
['https://nisar.asf.earthdatacloud.nasa.gov/NISAR/NISAR_L2_GCOV_BETA_V1/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001/NISAR_L2_PR_GCOV_006_172_A_008_2005_DHDH_A_20251204T024618_20251204T024653_X05009_N_F_J_001.h5']2c. Download the data¶
# Create a folder and download the files
data_dir = Path.home() / "GCOV_data"
data_dir.mkdir(exist_ok=True)
print(f'data_dir: {data_dir}')
#download files
asf.download_urls(hdf5_files, data_dir, session=session, processes=4)data_dir: /home/jovyan/GCOV_data
2d. Open the file and list the top-level HDF5 group¶
import h5py
#path to the NISAR GCOV HDF5 file
h5_path = list(data_dir.glob("NISAR_L2_*_GCOV*.h5"))[0]
#open the GCOV HDF5 file in read mode ("r")
#HDF5 files behave like Python dictionaries, where keys correspond to groups
f = h5py.File(h5_path, "r")
#inspect the top-level group names in the file
list(f)['science']3. Accessing the science and LSAR groups¶
All NISAR science products organize their primary data and metadata under the /science group. After opening the file, the first step in navigating a GCOV product is to inspect this group to determine which SAR instruments (LSAR only, or both SSAR and LSAR) are included in the file. This indicates available paths within the GCOV data.
In this notebook, we focus exclusively on GCOV data produced by the L-band SAR (LSAR) instrument.
list(f["science"])['LSAR']Output Explanation
The output of this command shows the available science subgroups within the file. Depending on the product and acquisition mode, a GCOV file may contain:
only
LSAR: L-band SAR data, orboth
LSARandSSAR: L-band and S-band SAR data
L-SAR and S-SAR data are always stored in separate groups, even if they cover the same geographic area or acquisition time.
4. Accessing the LSAR group¶
Within the /science/LSAR/ group, different product types and supporting information are organized into separate subgroups. For GCOV products, all gridded covariance data are stored under the /science/LSAR/GCOV/ path.
The next step is to inspect the contents of the /science/LSAR/ group to confirm the presence of the GCOV subgroup.
#list the contents of the /science/LSAR group
#this shows the available product types and supporting subgroups
list(f["science/LSAR"])['GCOV', 'identification']Output Explanation
The /science/LSAR/ group contains two subgroups:
GCOV: the gridded covariance product dataidentification: granule-level metadata describing the acquisition and processing of this product
The presence of the GCOV group confirms that this file contains GCOV data, and indicates where the gridded raster datasets are stored. The identification group contains supporting metadata and is useful for context, but note that it does not contain raster data.
5. Accessing the GCOV group¶
After confirming that the file contains GCOV data, the next step is to examine how the GCOV product is organized internally. The GCOV group serves as the top-level container for all gridded covariance data and related metadata within the LSAR science product.
Listing the contents of the GCOV group allows us to identify where the gridded measurement datasets are stored and which subgroups contain supporting information.
#list the contents of the GCOV group
#this reveals how GCOV data are organized internally
list(f["science/LSAR/GCOV"])['grids', 'metadata']Output Explanation
The GCOV group is organized into two primary subgroups:
grids: contains the gridded covariance datasets and related raster-style datametadata: contains product-level metadata specific to the GCOV product
All GCOV raster data that is commonly visualized or analyzed are stored under the grids group. The metadata group provides supporting information but does not contain gridded measurement arrays.
6. Navigating into the grids group¶
The contents of the grids group inform which frequency groups are present in the GCOV product. Depending on the acquisition mode, a GCOV file may contain data from one or two frequency groups.
#list the contents of the grids group
#this shows how GCOV raster data are organized by frequency
list(f["science/LSAR/GCOV/grids"])['frequencyA', 'frequencyB']Output Explanation
GCOV raster datasets are grouped by frequency band within the grids group. The two frequency groups represent different portions of the radar bandwidth:
frequencyA: datasets derived from the lower portion of the LSAR bandwidthfrequencyB: datasets derived from the higher portion of the LSAR bandwidth
Depending on the acquisition mode, a GCOV product may contain data from only one frequency group or from both frequencyA and frequencyB. The presence of both groups indicates that this product includes measurements from two portions of the L-SAR spectrum.
7. Inspecting datasets within a frequency group¶
The next step is to list the datasets available for a specific frequency group. In this example, we inspect frequencyA.
#list the datasets available within the frequencyA group
#these correspond to specific polarization combinations and supporting layers
list(f["science/LSAR/GCOV/grids/frequencyA"])['HHHH',
'HVHV',
'listOfCovarianceTerms',
'listOfPolarizations',
'mask',
'numberOfLooks',
'numberOfSubSwaths',
'projection',
'rtcGammaToSigmaFactor',
'xCoordinateSpacing',
'xCoordinates',
'yCoordinateSpacing',
'yCoordinates']Output Explanation
1. Gridded covariance (measurement) datasets¶
These datasets contain the primary GCOV raster data and represent specific polarization combinations. Examples include:
HHHH,HVHV: real-valued covariance terms
Other products may also include cross-polarized or complex-valued terms. These datasets are commonly extracted, visualized, or converted to raster formats.
2. Supporting raster layers¶
These datasets provide information needed to interpret or process radar measurement data:
numberOfLooks: effective number of looks for each pixelrtcGammaToSigmaFactor: factor used to convert radiometrically terrain-corrected gamma values to sigma-naughtmask: identifies valid and invalid pixels
These layers are spatially aligned with the covariance datasets.
3. Grid and coordinate information¶
These datasets define the spatial reference of the GCOV grids:
xCoordinates,yCoordinates: coordinate vectors defining the gridxCoordinateSpacing,yCoordinateSpacing: grid spacingprojection: map projection information
8. Accessing a specific GCOV dataset¶
Once the available datasets within a frequency group have been identified, the next step is to access an individual dataset directly. Before loading the full raster into memory, it is useful to inspect basic properties of the dataset, such as its corresponding shape and data type, to confirm that it contains the expected data.
In this example, we access the HVHV dataset from the frequencyA group.
#access a covariance (measurement) dataset by its full HDF5 path
#HVHV is a real-valued covariance term for the HV–HV polarization combination
hvhv = f["science/LSAR/GCOV/grids/frequencyA/HVHV"]
#inspect basic properties of the dataset without loading the full array
hvhv.shape, hvhv.dtype((16704, 17064), dtype('<f4'))Output Explanation
The
HVHVobject is an HDF5 dataset representing a gridded covariance measurement.The shape (16704, 17064) indicates the number of rows (16704) and columns (17064) in the raster grid.
The data type (‘<f4’) describes how the values are stored (in this example, 32-bit floating point).
Accessing these properties does not load the full dataset into memory, making this a safe way to verify dataset contents before reading the data values.
8a. Inspecting dataset attributes¶
Each GCOV dataset includes a set of attributes that describe how the data values should be interpreted. These attributes provide essential context such as units, valid value ranges, and how missing or invalid data are represented.
Before reading the dataset into memory or using it in analysis, we inspect these attributes to understand what the values might represent.
#list all attributes attached to the HVHV dataset
list(hvhv.attrs)['DIMENSION_LIST',
'_FillValue',
'description',
'grid_mapping',
'long_name',
'max_value',
'mean_value',
'min_value',
'sample_stddev',
'units',
'valid_min']#select key dataset attributes
#before reading the dataset into memory, we inspect a small number of attributes that directly affect how the data should be handled in analysis and visualization
print("Fill value:", hvhv.attrs.get("_FillValue"))
print("Minimum valid value:", hvhv.attrs.get("valid_min"))Fill value: nan
Minimum valid value: 0.0
Output Explanation
The
_FillValueattribute indicates how missing or invalid pixels are represented. In this dataset, missing values are stored asNaN, which are automatically handled by NumPy and plotting libraries.The
valid_minattribute indicates that valid values are expected to be non-negative. In this dataset, valid values are expected to be non-negative, which provides a simple sanity check when plotting or analyzing the data.
9. Reading the dataset into memory¶
Up to this point, we have been working with an HDF5 dataset object, which provides access to metadata without loading the data values themselves. To perform calculations, visualization, or raster export, the dataset values must be explicitly read into memory.
In this step, we load the HVHV dataset into a NumPy array.
#read the full HVHV dataset into memory as a NumPy array
hvhv_data = hvhv[()]
#confirm the array shape
hvhv_data.shape(16704, 17064)Output Explanation
hvhv_data is now a NumPy array containing the gridded covariance values.
The array shape matches the dimensions reported earlier by hvhv.shape.
Missing values are represented as NaN, consistent with the _FillValue attribute.
At this point, the GCOV raster values are available for direct analysis, visualization, or conversion to a standard raster format.
9a. Reading the grid coordinate vectors¶
GCOV datasets are stored as gridded arrays, and the file provides coordinate datasets that define the spatial location of each pixel. The xCoordinates and yCoordinates datasets provide the coordinate vectors for the grid, which are used to geolocate the raster.
#read the x and y coordinate vectors for the frequencyA grid
x = f["science/LSAR/GCOV/grids/frequencyA/xCoordinates"][:]
y = f["science/LSAR/GCOV/grids/frequencyA/yCoordinates"][:]
#inspect coordinate ranges and lengths
print(f"x range: {x.min():.1f} to {x.max():.1f}")
print(f"y range: {y.min():.1f} to {y.max():.1f}")
print(f"number of x coordinates (columns): {len(x)}")
print(f"number of y coordinates (rows): {len(y)}")x range: 604090.0 to 945350.0
y range: 1254250.0 to 1588310.0
number of x coordinates (columns): 17064
number of y coordinates (rows): 16704
Output Explanation
xandyare one-dimensional coordinate vectors that define the grid in the product’s projected coordinate system.The length of
xcorresponds to the number of raster columns, and the length ofycorresponds to the number of raster rows. These should match the dimensions ofhvhv_data.The minimum and maximum values of
xandydefine the spatial extent of the GCOV grid.
9b. Reading grid spacing¶
The xCoordinates and yCoordinates vectors define the grid location, but we also need the spacing between pixels. GCOV products provide xCoordinateSpacing and yCoordinateSpacing, which describe the pixel size in the projected coordinate system. These values are useful for confirming resolution and for constructing a georeferencing transform during raster export.
#read the grid spacing (pixel size) for the frequencyA grid
dx = f["science/LSAR/GCOV/grids/frequencyA/xCoordinateSpacing"][()]
dy = f["science/LSAR/GCOV/grids/frequencyA/yCoordinateSpacing"][()]
#print
print(f"x spacing (dx): {float(dx)}")
print(f"y spacing (dy): {float(dy)}")x spacing (dx): 20.0
y spacing (dy): -20.0
Output Explanation
The grid spacing in the x-direction (
dx) is 20 meters, indicating a pixel width of 20 m in the projected coordinate system.The grid spacing in the y-direction (
dy) is −20 meters. The negative sign indicates that y-coordinates decrease from north to south, which is a common convention for raster grids.Together, these values indicate that the GCOV data are provided on a 20 m × 20 m grid.
This spacing information, combined with the coordinate vectors, is used to construct a georeferencing transform when converting the GCOV dataset to a standard raster format.
9c. Reading projection information¶
The GCOV grid is defined in a projected coordinate system. To correctly interpret the coordinate values and export a georeferenced raster, we need to identify the projection used by the grid.
GCOV products store projection information within each frequency group.
#read the projection information for the frequencyA grid
projection = f["science/LSAR/GCOV/grids/frequencyA/projection"][()]
projectionnp.uint32(32637)Output Explanation
The projection value is 32637, stored as an unsigned integer (
np.uint32).This is an EPSG code, meaning the GCOV grid uses the coordinate reference system EPSG:32637.
The
xCoordinatesandyCoordinatesvalues are therefore in meters in this projected CRS.
With the CRS identified, we can now create a georeferencing transform and export the HVHV dataset to a GeoTIFF with a valid CRS.
10. Build a raster transform and export to GeoTIFF¶
Now we combine:
coordinate vectors (
x,y)pixel spacing (
dx,dy)CRS (EPSG:32637)
to export a properly georeferenced raster.
import numpy as np
import rasterio
from rasterio.transform import from_origin
#convert the projection value into a CRS string for rasterio
crs = f"EPSG:{int(projection)}"
#build pixel footprint
#x.min() gives the left edge of the raster
#y.max() gives the top edge of the raster
#dx is the pixel width
#abs(dy) is the pixel height (use absolute value to ensure positive values)
transform = from_origin(
x.min(), #x-coordinate of the upper-left corner
y.max(), #y-coordinate of the upper-left corner
float(dx), #pixel width (meters)
float(abs(dy)) #pixel height (meters)
)
#name of the output GeoTIFF file
out_tif = data_dir / "GCOV_frequencyA_HVHV.tif"
#open a new GeoTIFF file
with rasterio.open(
out_tif,
"w", #write mode
driver="GTiff", #output file format
height=hvhv_data.shape[0],#number of rows
width=hvhv_data.shape[1], #number of columns
count=1, #single-band raster
dtype=hvhv_data.dtype, #data type matches the NumPy array
crs=crs, #coordinate reference system
transform=transform, #georeferencing information
nodata=np.nan, #missing values stored as NaNs
) as dst:
#write the HVHV data array to band 1 of the GeoTIFF
dst.write(hvhv_data, 1)
#print the output filename for confirmation of file creation
out_tif
PosixPath('/home/jovyan/GCOV_data/GCOV_frequencyA_HVHV.tif')#quality check
with rasterio.open(out_tif) as src:
print("CRS:", src.crs)
print("Resolution:", src.res)
print("Bounds:", src.bounds)
CRS: EPSG:32637
Resolution: (20.0, 20.0)
Bounds: BoundingBox(left=604090.0, bottom=1254230.0, right=945370.0, top=1588310.0)
11. Summary¶
You have now opened and navigated an HDF5 file, saved it to memory, and exported a layer as a GeoTIFF.
12. Resources and references¶
Authors: Julia White, Alex Lewandowski