我已经使用Python xarray创建了3个数据数组和1个数据集。
da_test = xr.DataArray(rt, dims=['dtime', 'lat', 'lon'], coords={'dtime': tAxis, 'lat': Y,
'lon': X},)
da_test2 = xr.DataArray(rt1, dims=['dtime', 'lat', 'lon'], coords={'dtime': tAxis, 'lat': Y,
'lon': X},)
da_test3 = xr.DataArray(rt2, dims=['dtime', 'lat', 'lon'], coords={'dtime': tAxis, 'lat': Y,
'lon': X},)
ds = xr.Dataset({"test" : da_test, "test2" : da_test2, "test3" : da_test3})
现在,我正在尝试将数据集保存到netcdf文件。如果文件存在,则打开数据集,沿“ dtime”轴连接当前数据集,然后将其存储回netcdf。我已将“ dtime”指定为无限维度,以使其可以沿该维度扩展。
dsList = [ds]
if os.path.isfile('./'+outFileName):
diskDS = xr.open_dataset(outFileName, group="/satGrp")
dsList.append(diskDS)
finalDS = xr.concat(dsList, dim="dtime")
diskDS.close()
else:
finalDS = ds
# Setting up compression and writing to nc file
comp = dict(zlib=True, complevel=5)
encoding = {var: comp for var in finalDS.data_vars}
finalDS.to_netcdf(outFileName, group="/satGrp", mode='a', format="NETCDF4", engine='h5netcdf', unlimited_dims=["dtime"], encoding=encoding)
我正在尝试模拟一个用例,该脚本每半小时运行一次,并使用更新的数据集更新netcdf。第一次通过成功完成,并存储了数据集。但是当我下次运行它时,出现以下错误:
Traceback (most recent call last):
File "xr.py", line 87, in <module>
finalDS.to_netcdf(outFileName, group="/satGrp", mode='a', format="NETCDF4", engine='h5netcdf', unlimited_dims=["dtime"], encoding=encoding)
File "/usr/local/lib/python3.6/dist-packages/xarray/core/dataset.py", line 1384, in to_netcdf
compute=compute)
File "/usr/local/lib/python3.6/dist-packages/xarray/backends/api.py", line 886, in to_netcdf
unlimited_dims=unlimited_dims)
File "/usr/local/lib/python3.6/dist-packages/xarray/backends/api.py", line 929, in dump_to_store
unlimited_dims=unlimited_dims)
File "/usr/local/lib/python3.6/dist-packages/xarray/backends/common.py", line 271, in store
self.set_dimensions(variables, unlimited_dims=unlimited_dims)
File "/usr/local/lib/python3.6/dist-packages/xarray/backends/common.py", line 343, in set_dimensions
"%r (%d != %d)" % (dim, length, existing_dims[dim]))
TypeError: %d format: a number is required, not NoneType
存储在第一遍中的数据集:
<xarray.Dataset>
Dimensions: (dtime: 1, lat: 30, lon: 20)
Coordinates:
* dtime (dtime) datetime64[ns] 2019-09-18T12:06:00.298381
* lat (lat) int64 0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 28 29
* lon (lon) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Data variables:
test (dtime, lat, lon) float64 ...
test2 (dtime, lat, lon) float64 ...
test3 (dtime, lat, lon) float64 ...
内存中的当前运行数据集:
<xarray.Dataset>
Dimensions: (dtime: 1, lat: 30, lon: 20)
Coordinates:
* dtime (dtime) datetime64[ns] 2019-09-18T12:07:10.351870
* lat (lat) int64 0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 28 29
* lon (lon) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Data variables:
test (dtime, lat, lon) float64 47.42 977.8 168.2 ... 685.2 777.5 412.6
test2 (dtime, lat, lon) float64 105.4 2.173e+03 373.8 ... 1.728e+03 916.9
test3 (dtime, lat, lon) float64 26.32 542.7 93.36 ... 380.3 431.5 229.0
内存中的级联数据集:
<xarray.Dataset>
Dimensions: (dtime: 2, lat: 30, lon: 20)
Coordinates:
* lat (lat) int64 0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 28 29
* lon (lon) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
* dtime (dtime) datetime64[ns] 2019-09-18T12:07:10.351870 2019-09-18T12:06:00.298381
Data variables:
test (dtime, lat, lon) float64 47.42 977.8 168.2 ... 318.1 977.0 655.4
test2 (dtime, lat, lon) float64 105.4 2.173e+03 ... 2.171e+03 1.456e+03
test3 (dtime, lat, lon) float64 26.32 542.7 93.36 ... 176.6 542.3 363.7
在调试时,我在xarray common.py中发现了以下内容:
odict_items([('dtime', 2), ('lat', 30), ('lon', 20)])
<h5netcdf.Dimensions: lat=30, lon=20, dtime=None>
因此,检查在existing_dims ['dtime']失败的地方失败了。
有趣的是,如果我在to_netcdf调用中将模式更改为“ w”,则更新很顺利。但是由于我想在netcdf中有多个组,所以我确实需要使用“ a”模式。
期待有缓解此问题的想法。