Question

我正在使用来自多个netcdf文件的数据（在我的计算机上的文件夹中）。每个文件保存整个美国的数据，为期5年。基于x和y坐标的索引引用位置。我正在尝试为多个位置（网格单元）创建一个时间序列，将5年期间编译为20年期间（这将合并4个文件）。现在，我能够从一个位置的所有文件中提取数据，并使用numpy append将其编译为数组。但是，我想提取多个位置的数据，将其放入矩阵，其中行是位置，列包含时间序列降水数据。我想我必须创建一个列表或字典，但我不确定如何在循环中将数据分配到列表/字典。

我是python和netCDF的新手，如果这是一个简单的解决方案，请原谅我。我一直在使用这段代码作为指南，但还没有弄清楚如何根据我想做的事情来格式化它：Python Reading Multiple NetCDF Rainfall files of variable size

这是我的代码：

import glob
from netCDF4 import Dataset
import numpy as np

# Define x & y index for grid cell of interest 
    # Pittsburgh is 37,89
yindex = 37  #first number
xindex = 89  #second number

# Path
path = '/Users/LMC/Research Data/NARCCAP/'  
folder = 'MM5I_ccsm/'

## load data file names    
all_files = glob.glob(path + folder+'*.nc')
all_files.sort()

## initialize np arrays of timeperiods and locations
yindexlist = [yindex,'38','39'] # y indices for all grid cells of interest
xindexlist = [xindex,xindex,xindex] # x indices for all grid cells of interest
ngridcell = len(yindexlist)
ntimestep = 58400  # This is for 4 files of 14600 timesteps

## Initialize np array
timeseries_per_gridcell = np.empty(0)

## START LOOP FOR FILE IMPORT
for timestep, datafile in enumerate(all_files):    
    fh = Dataset(datafile,mode='r')  
    days = fh.variables['time'][:]
    lons = fh.variables['lon'][:]
    lats = fh.variables['lat'][:]
    precip = fh.variables['pr'][:]

    for i in range(1):
        timeseries_per_gridcell = np.append(timeseries_per_gridcell,precip[:,yindexlist[i],xindexlist[i]]*10800)

    fh.close()

print timeseries_per_gridcell

我在dropbox上放了3个文件，因此您可以访问它们，但我只允许发布2个链接。他们是：

https://www.dropbox.com/s/rso0hce8bq7yi2h/pr_MM5I_ccsm_2041010103.nc?dl=0 https://www.dropbox.com/s/j56undjvv7iph0f/pr_MM5I_ccsm_2046010103.nc?dl=0

Answer 1

一开始，我会建议以下内容来帮助解决您的问题。

首先，查看ncrcat以快速将各个netCDF文件连接到一个文件中。我强烈建议下载NCO用于netCDF操作，特别是在这种情况下，它将在以后简化您的Python编码。

我们假设文件名为precip_1.nc，precip_2.nc，precip_3.nc,和precip_4.nc。您可以沿记录维度连接它们以形成一个新的precip_all.nc，其记录维度长度为58400，并且

ncrcat precip_1.nc precip_2.nc precip_3.nc precip_4.nc -O precip_all.nc

在Python中，我们现在只需要读入新的单个文件，然后提取并存储所需网格单元的时间序列。像这样：

import netCDF4
import numpy as np

yindexlist = [1,2,3]
xindexlist = [4,5,6]
ngridcell = len(xidx)
ntimestep = 58400

# Define an empty 2D array to store time series of precip for a set of grid cells
timeseries_per_grid_cell = np.zeros([ngridcell, ntimestep])

ncfile = netCDF4.Dataset('path/to/file/precip_all.nc', 'r')

# Note that precip is 3D, so need to read in all dimensions
precip = ncfile.variables['precip'][:,:,:]

for i in range(ngridcell):
     timeseries_per_grid_cell[i,:] = precip[:, yindexlist[i], xindexlist[i]]

ncfile.close()

如果您只需要使用Python，那么您需要跟踪各个文件形成的时间索引块以制作全时序列。 58400/4 =每个文件14600个时间步长。因此，您将在每个单独的文件中读取另一个循环并存储相应的时间片，即第一个文件将填充0-14599，第二个文件将填充第二个14600-29199等。

Answer 2

您可以使用Python中的netCDF4软件包轻松地将多个netCDF文件合并为一个文件。请参见下面的示例：

我有四个netCDF文件，例如1.nc，2.nc，3.nc，4.nc。使用下面的命令将所有四个文件合并到一个数据集中。

import netCDF4
from netCDF4 import Dataset

dataset = netCDF4.MFDataset(['1.nc','2.nc','3.nc','4.nc'])

Answer 3

与N1B4的答案并行，您还可以在命令行上使用CDO沿时间维度将4个文件连接起来

cdo mergetime precip1.nc precip2.nc precip3.nc precip4.nc merged_file.nc

或带有通配符

cdo mergetime precip?.nc merged_file.nc

，然后按照该答案继续阅读。

您可以使用

在命令行中添加另一步骤以提取选择的位置

cdo remapnn,lon=X/lat=Y merged_file.nc my_location.nc

这会选择最接近您指定的经度/纬度（X，Y）坐标的网格单元，或者您可以根据需要使用双线性插值：

cdo remapbil,lon=X/lat=Y merged_file.nc my_location.nc

将多个NetCDF文件组合成时间序列多维数组python

3 个答案: