使用xarray

时间:2019-09-18 12:55:48

标签: python python-3.x python-xarray netcdf4

我已经使用Python xarray创建了3个数据数组和1个数据集。

da_test = xr.DataArray(rt, dims=['dtime', 'lat', 'lon'], coords={'dtime': tAxis, 'lat': Y,
                                   'lon': X},)

da_test2 = xr.DataArray(rt1, dims=['dtime', 'lat', 'lon'], coords={'dtime': tAxis, 'lat': Y,
                                   'lon': X},)

da_test3 = xr.DataArray(rt2, dims=['dtime', 'lat', 'lon'], coords={'dtime': tAxis, 'lat': Y,
                                   'lon': X},)

ds = xr.Dataset({"test" : da_test, "test2" : da_test2, "test3" : da_test3})

现在,我正在尝试将数据集保存到netcdf文件。如果文件存在,则打开数据集,沿“ dtime”轴连接当前数据集,然后将其存储回netcdf。我已将“ dtime”指定为无限维度,以使其可以沿该维度扩展。

dsList = [ds]

if os.path.isfile('./'+outFileName):
    diskDS = xr.open_dataset(outFileName, group="/satGrp")
    dsList.append(diskDS)
    finalDS = xr.concat(dsList, dim="dtime")
    diskDS.close()
else:
    finalDS = ds

# Setting up compression and writing to nc file
comp = dict(zlib=True, complevel=5)
encoding = {var: comp for var in finalDS.data_vars}
finalDS.to_netcdf(outFileName, group="/satGrp", mode='a', format="NETCDF4", engine='h5netcdf', unlimited_dims=["dtime"], encoding=encoding)

我正在尝试模拟一个用例,该脚本每半小时运行一次,并使用更新的数据集更新netcdf。第一次通过成功完成,并存储了数据集。但是当我下次运行它时,出现以下错误:

Traceback (most recent call last):
  File "xr.py", line 87, in <module>
    finalDS.to_netcdf(outFileName, group="/satGrp", mode='a', format="NETCDF4", engine='h5netcdf', unlimited_dims=["dtime"], encoding=encoding)
  File "/usr/local/lib/python3.6/dist-packages/xarray/core/dataset.py", line 1384, in to_netcdf
    compute=compute)
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/api.py", line 886, in to_netcdf
    unlimited_dims=unlimited_dims)
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/api.py", line 929, in dump_to_store
    unlimited_dims=unlimited_dims)
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/common.py", line 271, in store
    self.set_dimensions(variables, unlimited_dims=unlimited_dims)
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/common.py", line 343, in set_dimensions
    "%r (%d != %d)" % (dim, length, existing_dims[dim]))
TypeError: %d format: a number is required, not NoneType

存储在第一遍中的数据集:

<xarray.Dataset>
Dimensions:  (dtime: 1, lat: 30, lon: 20)
Coordinates:
  * dtime    (dtime) datetime64[ns] 2019-09-18T12:06:00.298381
  * lat      (lat) int64 0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 28 29
  * lon      (lon) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Data variables:
    test     (dtime, lat, lon) float64 ...
    test2    (dtime, lat, lon) float64 ...
    test3    (dtime, lat, lon) float64 ...

内存中的当前运行数据集:

<xarray.Dataset>
Dimensions:  (dtime: 1, lat: 30, lon: 20)
Coordinates:
  * dtime    (dtime) datetime64[ns] 2019-09-18T12:07:10.351870
  * lat      (lat) int64 0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 28 29
  * lon      (lon) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Data variables:
    test     (dtime, lat, lon) float64 47.42 977.8 168.2 ... 685.2 777.5 412.6
    test2    (dtime, lat, lon) float64 105.4 2.173e+03 373.8 ... 1.728e+03 916.9
    test3    (dtime, lat, lon) float64 26.32 542.7 93.36 ... 380.3 431.5 229.0

内存中的级联数据集:

<xarray.Dataset>
Dimensions:  (dtime: 2, lat: 30, lon: 20)
Coordinates:
  * lat      (lat) int64 0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 28 29
  * lon      (lon) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
  * dtime    (dtime) datetime64[ns] 2019-09-18T12:07:10.351870 2019-09-18T12:06:00.298381
Data variables:
    test     (dtime, lat, lon) float64 47.42 977.8 168.2 ... 318.1 977.0 655.4
    test2    (dtime, lat, lon) float64 105.4 2.173e+03 ... 2.171e+03 1.456e+03
    test3    (dtime, lat, lon) float64 26.32 542.7 93.36 ... 176.6 542.3 363.7

在调试时,我在xarray common.py中发现了以下内容:

odict_items([('dtime', 2), ('lat', 30), ('lon', 20)])
<h5netcdf.Dimensions: lat=30, lon=20, dtime=None>

因此,检查在existing_dims ['dtime']失败的地方失败了。

有趣的是,如果我在to_netcdf调用中将模式更改为“ w”,则更新很顺利。但是由于我想在netcdf中有多个组,所以我确实需要使用“ a”模式。

期待有缓解此问题的想法。

0 个答案:

没有答案