读取HDF文件-https和Xarray

时间:2019-09-24 23:12:10

标签: https python-xarray hdf

我正在尝试通过https连接从Harmonized Landsat Sentinel存储库(此处为https://hls.gsfc.nasa.gov/data/v1.4/

)读取hdf文件

理想情况下,我会使用xarray来做到这一点。这是一个示例:

https的示例:

xr.open_rasterio('https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf')

<xarray.DataArray (band: 1, y: 3660, x: 3660)>
[13395600 values with dtype=int16]
Coordinates:
  * band     (band) int64 1
  * y        (y) float64 4.6e+06 4.6e+06 4.6e+06 ... 4.49e+06 4.49e+06 4.49e+06
  * x        (x) float64 5e+05 5e+05 5.001e+05 ... 6.097e+05 6.097e+05 6.098e+05
Attributes:
    transform:                 (30.0, -0.0, 499980.0, -0.0, -30.0, 4600020.0)
    crs:                       +init=epsg:32613
    res:                       (30.0, 30.0)
    is_tiled:                  0
    nodatavals:                (nan,)
    scales:                    (1.0,)
    offsets:                   (0.0,)
    bands:                     1
    byte_order:                0
    coordinate_system_string:  PROJCS["UTM_Zone_13N",GEOGCS["GCS_WGS_1984",DA...
    data_type:                 2
    description:               HDF Imported into ENVI.
    file_type:                 HDF Scientific Data
    header_offset:             0
    interleave:                bsq
    lines:                     3660
    samples:                   3660

请注意,这些文件具有多个数据集/波段,因此以上内容是不正确的。

xr.open_dataset('https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf')

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
    194             try:
--> 195                 file = self._cache[self._key]
    196             except KeyError:

/opt/conda/lib/python3.7/site-packages/xarray/backends/lru_cache.py in __getitem__(self, key)
     42         with self._lock:
---> 43             value = self._cache[key]
     44             self._cache.move_to_end(key)

KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
<ipython-input-85-7765ae565af3> in <module>
----> 1 xr.open_dataset('https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf')

/opt/conda/lib/python3.7/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime)
    497         if engine == "netcdf4":
    498             store = backends.NetCDF4DataStore.open(
--> 499                 filename_or_obj, group=group, lock=lock, **backend_kwargs
    500             )
    501         elif engine == "scipy":

/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in open(cls, filename, mode, format, group, clobber, diskless, persist, lock, lock_maker, autoclose)
    387             netCDF4.Dataset, filename, mode=mode, kwargs=kwargs
    388         )
--> 389         return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
    390 
    391     def _acquire(self, needs_lock=True):

/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in __init__(self, manager, group, mode, lock, autoclose)
    333         self._group = group
    334         self._mode = mode
--> 335         self.format = self.ds.data_model
    336         self._filename = self.ds.filepath()
    337         self.is_remote = is_remote_uri(self._filename)

/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in ds(self)
    396     @property
    397     def ds(self):
--> 398         return self._acquire()
    399 
    400     def open_store_variable(self, name, var):

/opt/conda/lib/python3.7/site-packages/xarray/backends/netCDF4_.py in _acquire(self, needs_lock)
    390 
    391     def _acquire(self, needs_lock=True):
--> 392         with self._manager.acquire_context(needs_lock) as root:
    393             ds = _nc4_require_group(root, self._group, self._mode)
    394         return ds

/opt/conda/lib/python3.7/contextlib.py in __enter__(self)
    110         del self.args, self.kwds, self.func
    111         try:
--> 112             return next(self.gen)
    113         except StopIteration:
    114             raise RuntimeError("generator didn't yield") from None

/opt/conda/lib/python3.7/site-packages/xarray/backends/file_manager.py in acquire_context(self, needs_lock)
    181     def acquire_context(self, needs_lock=True):
    182         """Context manager for acquiring a file."""
--> 183         file, cached = self._acquire_with_cache_info(needs_lock)
    184         try:
    185             yield file

/opt/conda/lib/python3.7/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
    199                     kwargs = kwargs.copy()
    200                     kwargs["mode"] = self._mode
--> 201                 file = self._opener(*self._args, **kwargs)
    202                 if self._mode == "w":
    203                     # ensure file doesn't get overriden when opened again

netCDF4/_netCDF4.pyx in netCDF4._netCDF4.Dataset.__init__()

netCDF4/_netCDF4.pyx in netCDF4._netCDF4._ensure_nc_success()

OSError: [Errno -90] NetCDF: file not found: b'https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf'

从光盘读取时:

xr.open_rasterio('HLS.S30.T13TEF.2017002.v1.4.hdf')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-88-f4ae5075928a> in <module>
----> 1 xr.open_rasterio('HLS.S30.T13TEF.2017002.v1.4.hdf')

/opt/conda/lib/python3.7/site-packages/xarray/backends/rasterio_.py in open_rasterio(filename, parse_coordinates, chunks, cache, lock)
    250     # Get bands
    251     if riods.count < 1:
--> 252         raise ValueError("Unknown dims")
    253     coords["band"] = np.asarray(riods.indexes)
    254 

ValueError: Unknown dims

xr.open_dataset('/home/rowangaffney/Desktop/HLS.S30.T13TEF.2017002.v1.4.hdf')

<xarray.Dataset>
Dimensions:  (XDim_Grid: 3660, YDim_Grid: 3660)
Dimensions without coordinates: XDim_Grid, YDim_Grid
Data variables:
    B01      (YDim_Grid, XDim_Grid) float32 ...
    B02      (YDim_Grid, XDim_Grid) float32 ...
    B03      (YDim_Grid, XDim_Grid) float32 ...
    B04      (YDim_Grid, XDim_Grid) float32 ...
    B05      (YDim_Grid, XDim_Grid) float32 ...
    B06      (YDim_Grid, XDim_Grid) float32 ...
    B07      (YDim_Grid, XDim_Grid) float32 ...
    B08      (YDim_Grid, XDim_Grid) float32 ...
    B8A      (YDim_Grid, XDim_Grid) float32 ...
    B09      (YDim_Grid, XDim_Grid) float32 ...
    B10      (YDim_Grid, XDim_Grid) float32 ...
    B11      (YDim_Grid, XDim_Grid) float32 ...
    B12      (YDim_Grid, XDim_Grid) float32 ...
    QA       (YDim_Grid, XDim_Grid) float32 ...
Attributes:
    PRODUCT_URI:                                       S2A_MSIL1C_20170102T17...
    L1C_IMAGE_QUALITY:                                 SENSOR:PASSED GEOMETRI...
    SPACECRAFT_NAME:                                   Sentinel-2A
    TILE_ID:                                           S2A_OPER_MSI_L1C_TL_SG...
    DATASTRIP_ID:                                      S2A_OPER_MSI_L1C_DS_SG...
    PROCESSING_BASELINE:                               02.04
    SENSING_TIME:                                      2017-01-02T17:58:23.575Z
    L1_PROCESSING_TIME:                                2017-01-02T21:41:37.84...
    HORIZONTAL_CS_NAME:                                WGS84 / UTM zone 13N
    HORIZONTAL_CS_CODE:                                EPSG:32613
    NROWS:                                             3660
    NCOLS:                                             3660
    SPATIAL_RESOLUTION:                                30
    ULX:                                               499980.0
    ULY:                                               4600020.0
    MEAN_SUN_ZENITH_ANGLE(B01):                        65.3577462333765
    MEAN_SUN_AZIMUTH_ANGLE(B01):                       165.01162242158
    MEAN_VIEW_ZENITH_ANGLE(B01):                       8.10178275092502
    MEAN_VIEW_AZIMUTH_ANGLE(B01):                      285.224586475702
    spatial_coverage:                                  89
    cloud_coverage:                                    72
    ACCODE:                                            LaSRCS2AV3.5.5
    arop_s2_refimg:                                    NONE
    arop_ncp:                                          0
    arop_rmse(meters):                                 0.0
    arop_ave_xshift(meters):                           0.0
    arop_ave_yshift(meters):                           0.0
    HLS_PROCESSING_TIME:                               2018-02-24T18:17:49Z
    NBAR_Solar_Zenith:                                 44.82820466504637
    AngleBand:                                         [ 0  1  2  3  4  5  6 ...
    MSI band 01 bandpass adjustment slope and offset:  0.995900, -0.000200
    MSI band 02 bandpass adjustment slope and offset:  0.977800, -0.004000
    MSI band 03 bandpass adjustment slope and offset:  1.005300, -0.000900
    MSI band 04 bandpass adjustment slope and offset:  0.976500, 0.000900
    MSI band 8a bandpass adjustment slope and offset:  0.998300, -0.000100
    MSI band 11 bandpass adjustment slope and offset:  0.998700, -0.001100
    MSI band 12 bandpass adjustment slope and offset:  1.003000, -0.001200
    StructMetadata.0:                                  GROUP=SwathStructure\n.

对通过https读取这些数据的最佳做法有任何想法吗?

谢谢!

1 个答案:

答案 0 :(得分:0)

我建议阅读http://matthewrocklin.com/blog/work/2018/02/06/hdf-in-the-cloud,以了解为什么它看起来并不那么简单(直接从https访问HDF5文件)。因此,这并不是完全解决方案,但您可能需要下载数据并从那里加载数据(至少在短期内如此)。

哦,您可能想尝试使用'h5netcdf'引擎来读取文件:

xr.open_dataset("HLS.S30.T13TEF.2017002.v1.4.hdf", engine="h5netcdf")

如果您只对一个乐队感兴趣,请执行以下操作:

xr.open_dataset("HLS.S30.T13TEF.2017002.v1.4.hdf", engine="h5netcdf", group="B01")

不过,请注意,如果您将xarray'h5netcdf'引擎一起使用,并且已安装'h5pyd',以下代码在某些情况下会起作用库,并将URL存储在HDF REST API界面上:

xr.open_dataset(
    "https://hls.gsfc.nasa.gov/data/v1.4/S30/2017/13/T/E/F/HLS.S30.T13TEF.2017002.v1.4.hdf",
    engine="h5netcdf",
)

但是不幸的是,这些NASA数据集的情况并非如此...