从多个文件动态创建XArray数据集

时间:2018-10-10 10:15:42

标签: python python-xarray

我想从几个输入文件创建一个xarray数据集。 每个文件具有1个时间戳,1个级别和1个源。 带有2个时间戳,2个级别和3个来源的示例:

d1 = np.arange(0,9).reshape((3,3))   # 01:00, 3m, SG1 -> 00001-101.con
d2 = np.arange(10,19).reshape((3,3)) # 01:00, 3m, SG2 -> 00001-102.con
d3 = np.arange(20,29).reshape((3,3)) # 01:00, 3m, SG3 -> 00001-103.con
d4 = np.arange(60,69).reshape((3,3)) # 01:00, 10m, SG1 -> 00001-201.con
d5 = np.arange(70,79).reshape((3,3)) # 01:00, 10m, SG2 -> 00001-202.con
d6 = np.arange(80,89).reshape((3,3)) # 01:00, 10m, SG3 -> 00001-203.con

e1 = np.arange(100,109).reshape((3,3)) # 02:00, 3m, SG1 -> 00002-101.con
e2 = np.arange(110,119).reshape((3,3)) # 02:00, 3m, SG2 -> 00002-102.con
e3 = np.arange(120,129).reshape((3,3)) # 02:00, 3m, SG3 -> 00002-103.con
e4 = np.arange(160,169).reshape((3,3)) # 02:00 10m, SG1 -> 00002-201.con
e5 = np.arange(170,179).reshape((3,3)) # 02:00 10m, SG2 -> 00002-202.con
e6 = np.arange(180,189).reshape((3,3)) # 02:00 10m, SG3 -> 00002-203.con


dstk = np.stack((d1,d2,d3,d4,d5,d6)) # 01:00, both levels all sgs -> 00001-101.con - 00001-203.con
estk = np.stack((e1,e2,e3,e4,e5,e6)) # 02:00, both levels all sgs -> 00002-101.con - 00002-203.con

我设法按照我需要的方式手动创建数据集:

xx = [100,200,300]
yy = [600,700,800]

dds1 = xr.Dataset(data_vars={"SG1":(("x","y"),dstk[0]),"SG2":(("x","y"),dstk[1]),"SG3":(("x","y"),dstk[2])},coords={"x":xx,"y":yy,"lvl":"3m","t":pd.Timestamp("2010-01-01 01:00:00")})
dds2 = xr.Dataset(data_vars={"SG1":(("x","y"),dstk[3]),"SG2":(("x","y"),dstk[4]),"SG3":(("x","y"),dstk[5])},coords={"x":xx,"y":yy,"lvl":"10m","t":pd.Timestamp("2010-01-01 01:00:00")})

eds1 = xr.Dataset(data_vars={"SG1":(("x","y"),estk[3]),"SG2":(("x","y"),estk[4]),"SG3":(("x","y"),estk[5])},coords={"x":xx,"y":yy,"lvl":"3m","t":pd.Timestamp("2010-01-01 02:00:00")})
eds2 = xr.Dataset(data_vars={"SG1":(("x","y"),estk[3]),"SG2":(("x","y"),estk[4]),"SG3":(("x","y"),estk[5])},coords={"x":xx,"y":yy,"lvl":"10m","t":pd.Timestamp("2010-01-01 02:00:00")})

td1 = xr.concat([dds1,dds2],dim="lvl")
td2 = xr.concat([eds1,eds2],dim="lvl")

td_final = xr.concat([td1,td2],dim="t")

这给了我这个

<xarray.Dataset>
Dimensions:  (lvl: 2, t: 2, x: 3, y: 3)
Coordinates:
  * x        (x) int32 100 200 300
  * y        (y) int32 600 700 800
  * lvl      (lvl) <U3 '3m' '10m'
  * t        (t) datetime64[ns] 2010-01-01T01:00:00 2010-01-01T02:00:00
Data variables:
    SG1      (t, lvl, x, y) int32 0 1 2 3 4 5 6 7 8 60 61 62 63 64 65 66 67 ...
    SG2      (t, lvl, x, y) int32 10 11 12 13 14 15 16 17 18 70 71 72 73 74 ...
    SG3      (t, lvl, x, y) int32 20 21 22 23 24 25 26 27 28 80 81 82 83 84 ...

但是,这似乎太复杂了,我想通过例如循环遍历时间戳,源组和级别的列表来动态创建数据集。

0 个答案:

没有答案