我正在尝试通过https连接从Harmonized Landsat Sentinel存储库(此处为https://hls.gsfc.nasa.gov/data/v1.4/
)读取hdf文件理想情况下,我会使用xarray来做到这一点。这是一个示例:
https的示例:
xr.open_rasterio('https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf')
<xarray.DataArray (band: 1, y: 3660, x: 3660)>
[13395600 values with dtype=int16]
Coordinates:
* band (band) int64 1
* y (y) float64 4.6e+06 4.6e+06 4.6e+06 ... 4.49e+06 4.49e+06 4.49e+06
* x (x) float64 5e+05 5e+05 5.001e+05 ... 6.097e+05 6.097e+05 6.098e+05
Attributes:
transform: (30.0, -0.0, 499980.0, -0.0, -30.0, 4600020.0)
crs: +init=epsg:32613
res: (30.0, 30.0)
is_tiled: 0
nodatavals: (nan,)
scales: (1.0,)
offsets: (0.0,)
bands: 1
byte_order: 0
coordinate_system_string: PROJCS["UTM_Zone_13N",GEOGCS["GCS_WGS_1984",DA...
data_type: 2
description: HDF Imported into ENVI.
file_type: HDF Scientific Data
header_offset: 0
interleave: bsq
lines: 3660
samples: 3660
请注意,这些文件具有多个数据集/波段,因此以上内容是不正确的。
xr.open_dataset('https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
194 try:
--> 195 file = self._cache[self._key]
196 except KeyError:
/opt/conda/lib/python3.7/site-packages/xarray/backends/lru_cache.py in __getitem__(self, key)
42 with self._lock:
---> 43 value = self._cache[key]
44 self._cache.move_to_end(key)
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]
During handling of the above exception, another exception occurred:
OSError Traceback (most recent call last)
<ipython-input-85-7765ae565af3> in <module>
----> 1 xr.open_dataset('https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf')
/opt/conda/lib/python3.7/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime)
497 if engine == "netcdf4":
498 store = backends.NetCDF4DataStore.open(
--> 499 filename_or_obj, group=group, lock=lock, **backend_kwargs
500 )
501 elif engine == "scipy":
/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
387 netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
388 )
--> 389 return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
390
391 def _acquire(self, needs_lock=True):
/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in __init__(self, manager, group, mode, lock, autoclose)
333 self._group = group
334 self._mode = mode
--> 335 self.format = self.ds.data_model
336 self._filename = self.ds.filepath()
337 self.is_remote = is_remote_uri(self._filename)
/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in ds(self)
396 @property
397 def ds(self):
--> 398 return self._acquire()
399
400 def open_store_variable(self, name, var):
/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in _acquire(self, needs_lock)
390
391 def _acquire(self, needs_lock=True):
--> 392 with self._manager.acquire_context(needs_lock) as root:
393 ds = _nc4_require_group(root, self._group, self._mode)
394 return ds
/opt/conda/lib/python3.7/contextlib.py in __enter__(self)
110 del self.args, self.kwds, self.func
111 try:
--> 112 return next(self.gen)
113 except StopIteration:
114 raise RuntimeError("generator didn't yield") from None
/opt/conda/lib/python3.7/site-packages/xarray/backends/file_manager.py in acquire_context(self, needs_lock)
181 def acquire_context(self, needs_lock=True):
182 """Context manager for acquiring a file."""
--> 183 file, cached = self._acquire_with_cache_info(needs_lock)
184 try:
185 yield file
/opt/conda/lib/python3.7/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
199 kwargs = kwargs.copy()
200 kwargs["mode"] = self._mode
--> 201 file = self._opener(*self._args, **kwargs)
202 if self._mode == "w":
203 # ensure file doesn't get overriden when opened again
netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__()
netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()
OSError: [Errno -90] NetCDF: file not found: b'https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf'
从光盘读取时:
xr.open_rasterio('HLS.S30.T13TEF.2017002.v1.4.hdf')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-88-f4ae5075928a> in <module>
----> 1 xr.open_rasterio('HLS.S30.T13TEF.2017002.v1.4.hdf')
/opt/conda/lib/python3.7/site-packages/xarray/backends/rasterio_.py in open_rasterio(filename, parse_coordinates, chunks, cache, lock)
250 # Get bands
251 if riods.count < 1:
--> 252 raise ValueError("Unknown dims")
253 coords["band"] = np.asarray(riods.indexes)
254
ValueError: Unknown dims
和
xr.open_dataset('/home/rowangaffney/Desktop/HLS.S30.T13TEF.2017002.v1.4.hdf')
<xarray.Dataset>
Dimensions: (XDim_Grid: 3660, YDim_Grid: 3660)
Dimensions without coordinates: XDim_Grid, YDim_Grid
Data variables:
B01 (YDim_Grid, XDim_Grid) float32 ...
B02 (YDim_Grid, XDim_Grid) float32 ...
B03 (YDim_Grid, XDim_Grid) float32 ...
B04 (YDim_Grid, XDim_Grid) float32 ...
B05 (YDim_Grid, XDim_Grid) float32 ...
B06 (YDim_Grid, XDim_Grid) float32 ...
B07 (YDim_Grid, XDim_Grid) float32 ...
B08 (YDim_Grid, XDim_Grid) float32 ...
B8A (YDim_Grid, XDim_Grid) float32 ...
B09 (YDim_Grid, XDim_Grid) float32 ...
B10 (YDim_Grid, XDim_Grid) float32 ...
B11 (YDim_Grid, XDim_Grid) float32 ...
B12 (YDim_Grid, XDim_Grid) float32 ...
QA (YDim_Grid, XDim_Grid) float32 ...
Attributes:
PRODUCT_URI: S2A_MSIL1C_20170102T17...
L1C_IMAGE_QUALITY: SENSOR:PASSED GEOMETRI...
SPACECRAFT_NAME: Sentinel-2A
TILE_ID: S2A_OPER_MSI_L1C_TL_SG...
DATASTRIP_ID: S2A_OPER_MSI_L1C_DS_SG...
PROCESSING_BASELINE: 02.04
SENSING_TIME: 2017-01-02T17:58:23.575Z
L1_PROCESSING_TIME: 2017-01-02T21:41:37.84...
HORIZONTAL_CS_NAME: WGS84 / UTM zone 13N
HORIZONTAL_CS_CODE: EPSG:32613
NROWS: 3660
NCOLS: 3660
SPATIAL_RESOLUTION: 30
ULX: 499980.0
ULY: 4600020.0
MEAN_SUN_ZENITH_ANGLE(B01): 65.3577462333765
MEAN_SUN_AZIMUTH_ANGLE(B01): 165.01162242158
MEAN_VIEW_ZENITH_ANGLE(B01): 8.10178275092502
MEAN_VIEW_AZIMUTH_ANGLE(B01): 285.224586475702
spatial_coverage: 89
cloud_coverage: 72
ACCODE: LaSRCS2AV3.5.5
arop_s2_refimg: NONE
arop_ncp: 0
arop_rmse(meters): 0.0
arop_ave_xshift(meters): 0.0
arop_ave_yshift(meters): 0.0
HLS_PROCESSING_TIME: 2018-02-24T18:17:49Z
NBAR_Solar_Zenith: 44.82820466504637
AngleBand: [ 0 1 2 3 4 5 6 ...
MSI band 01 bandpass adjustment slope and offset: 0.995900, -0.000200
MSI band 02 bandpass adjustment slope and offset: 0.977800, -0.004000
MSI band 03 bandpass adjustment slope and offset: 1.005300, -0.000900
MSI band 04 bandpass adjustment slope and offset: 0.976500, 0.000900
MSI band 8a bandpass adjustment slope and offset: 0.998300, -0.000100
MSI band 11 bandpass adjustment slope and offset: 0.998700, -0.001100
MSI band 12 bandpass adjustment slope and offset: 1.003000, -0.001200
StructMetadata.0: GROUP=SwathStructure\n.
对通过https读取这些数据的最佳做法有任何想法吗?
谢谢!
答案 0 :(得分:0)
我建议阅读http://matthewrocklin.com/blog/work/2018/02/06/hdf-in-the-cloud,以了解为什么它看起来并不那么简单(直接从https访问HDF5文件)。因此,这并不是完全解决方案,但您可能需要下载数据并从那里加载数据(至少在短期内如此)。
哦,您可能想尝试使用'h5netcdf'引擎来读取文件:
xr.open_dataset("HLS.S30.T13TEF.2017002.v1.4.hdf", engine="h5netcdf")
如果您只对一个乐队感兴趣,请执行以下操作:
xr.open_dataset("HLS.S30.T13TEF.2017002.v1.4.hdf", engine="h5netcdf", group="B01")
不过,请注意,如果您将xarray
与'h5netcdf'引擎一起使用,并且已安装'h5pyd',以下代码在某些情况下会起作用库,并将URL存储在HDF REST API界面上:
xr.open_dataset(
"https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf",
engine="h5netcdf",
)
但是不幸的是,这些NASA数据集的情况并非如此...