Question

我正在尝试编写一些Python代码，其中包含澳大利亚气象局雨量计NetCDF文件，并为集水区中的一组仪表提取降雨量。格式有点奇怪。他们选择创建一个单一的时间步骤文件，其中包含澳大利亚广泛的每个记录值的时间步长？但是，当仪表未记录该值且缺少工作站时。我想尝试找到丢失的站点，只需创建一个零降雨值。我已经确定了电台ID，但如何将零记录添加到我的列表中？

以下是代码的一部分：

# First LOOP through all files for the day and accumulate data.
for timestep, datafile in enumerate(stationdata):
    print datafile[-16:-3]
    data = netcdf.NetCDFFile(datafile, 'r')
try:
    precip = data.variables['precipitation'].data
except:
    precip = data.variables['precip'].data
try:
    stid = data.variables['station_id'].data
except:
    stid = data.variables['stid'].data
# create np array of indices of the gauge id present in the current file (Note not ALL required ones may be present!!)
idx = np.where(np.in1d(stid, gauge_ids))[0]
print 'index len = '+str(len(idx))+' Gauges: '+str(ngauges)
# This process DOES NOT SEEM to Capture Missing Gauge Data
# If a Gauge ID is not present how to we set its value to Zero for this time step ?
for i in idx:
    print i,stid[i],precip[i], timestep
    timeseries_per_station[str(stid[i])][timestep] = precip[i] # This adds the rainfall to the time series for the Station ID in the found set from its index
data.close()
# Now go through the list of Gauges ngauges with IDs gauge_ids, and fill missing ones with zero
# For stid not in gauge_ids set to Zero ... How ???
# create a Zero list and remove ID's that already have values ??
# Try    [i for i in a if i not in b]
print [k for k in gauge_ids if k not in stid]
for l in [k for k in gauge_ids if k not in stid]:
    print l, timestep
    timeseries_per_station[l][timestep] = 0.0
raw_input('check..')

行for l in [k for k in gauge_ids if k not in stid]:按预期标识丢失的电台，但timeseries_per_station[l][timestep] = -1.0会产生IndexError: index out of bounds。这是我想将缺失数据设置为可识别值的地方。

当代码到达的数据段小于原始站点数（26），并且只读取25或24等时，会出现此错误？

任何线索都是最有帮助的......

另一种方法是使用不同的结构将数据读入：结构应如下所示：对于每个时间片数据文件，有雨量计站的数据，如ID，纬度，经度，降水。我想绘制降水的空间变化和每个时间片的空间变化。时间片数据包含在时间片文件的文件名中。

由于

Answer 1

由于我们没有数据和完整的python脚本，因此很难看到这里发生的一切，但作为一个起点你可以：

查找所有工作站可用的最大观测数（例如maxN）和最大工作站数（maxS）
创建一个Numpy数组来保存您从文件中读取的数据：mydata=numpy.zeros((maxS,maxN))
开始从文件中读取数据并按照您当前的操作填写，但使用从开始到结束时间步长计数的索引。如果在文件中找不到当前时间步长，请用NaN替换值。

这些步骤应该允许您最终得到一个包含数据的数组，其中包含您没有信息的缺失值。目前，您使用的数据数组是数据大小减去缺失值。您需要使用数据长度加上缺少的时间步长来初始化数组。我还建议您在没有信息的情况下使用NaN（numpy.nan）值，因为使用零值会影响您可能要对数据执行的统计信息。将数据正确存储在数组中后，可以使用优秀的Pandas库来分析时间序列。

Python如何从NetCDF文件创建的数据中填充列表中的缺失数据

1 个答案: