Question

我的目标是从netcdf文件访问数据并以下列格式写入CSV文件。

Latitude  Longitude Date1  Date2  Date3
100       200       <-- MIN_SFC values -->

到目前为止，我已经访问了变量，将标题写入文件并填充了lat / lons。

如何访问指定的lon，lat坐标和日期的MIN_SFC值，然后写入CSV文件。

我是一个蟒蛇新手，如果有更好的方法可以解决这个问题，请告诉我。

NetCDF文件信息：

Dimensions:
  time = 7 
  latitude = 292
  longitude =341

Variables:
  float MIN_SFC (time=7, latitude = 292, longitude = 341)

这是我尝试过的：

 from netCDF4 import Dataset, num2date

 filename = "C:/filename.nc"

 nc = Dataset(filename, 'r', Format='NETCDF4')
 print nc.variables

 print 'Variable List'

 for var in nc.variables:
    print var, var.units, var.shape

 # get coordinates variables
 lats = nc.variables['latitude'][:]
 lons = nc.variables['longitude'][:]

 sfc= nc.variables['Min_SFC'][:]
 times = nc.variables['time'][:]

 # convert date, how to store date only strip away time?
 print "Converting Dates"
 units = nc.variables['time'].units
 dates = num2date (times[:], units=units, calendar='365_day')

 #print [dates.strftime('%Y%m%d%H') for date in dates]

 header = ['Latitude', 'Longitude']

 # append dates to header string

 for d in dates:
    print d
    header.append(d)

 # write to file
 import csv

 with open('Output.csv', 'wb') as csvFile:
    outputwriter = csv.writer(csvFile, delimiter=',')
    outputwriter.writerow(header)
    for lat, lon in zip(lats, lons):
      outputwriter.writerow( [lat, lon] )
 
 # close the output file
 csvFile.close()

 # close netcdf
 nc.close()

更新

我已经更新了写入CSV文件的代码，有一个属性错误，因为lat / lon是双倍的。

AttributeError：'numpy.float32'对象没有属性'append'

在python中转换为字符串的任何方法？你认为它会起作用吗？

当我向控制台打印值时，我注意到许多值返回为“ - ”。我想知道这是否代表fillValue或missingValue定义为-32767.0。

我也想知道是否应该通过lats = nc.variables ['latitude'] [：] [：]或lats = nc.variables ['latitude'] [：] [来访问3d数据集的变量：，：]？

# the csv file is closed when you leave the block
with open('output.csv', 'wb') as csvFile:
    outputwriter = csv.writer(csvFile, delimiter=',')
    for time_index, time in enumerate(times): # pull the dates out for the header
         t = num2date(time, units = units, calendar='365_day')
         header.append(t)
    outputwriter.writerow(header)  
    for lat_index, lat in enumerate(lats):
        content = lat
        print lat_index
        for lon_index, lon in enumerate(lons):
            content.append(lon)
            print lon_index    
            for time_index, time in enumerate(times): # for a date
                # pull out the data 
                data = sfc[time_index,lat_index,lon_index]
                content.append(data)
                outputwriter.writerow(content)

Answer 1

我会将数据加载到Pandas中，这有助于分析和绘制时间序列数据，以及写入CSV。

因此，这是一个真实的工作示例，它从全局预测模型数据集中的指定lon，lat位置提取时间序列的波高。

注意：这里我们访问OPeNDAP数据集，这样我们就可以从远程服务器中提取所需的数据而无需下载文件。但netCDF4对于删除OPeNDAP数据集或本地NetCDF文件的工作方式完全相同，这是一个非常有用的功能！

import netCDF4
import pandas as pd
import matplotlib.pyplot as plt

# NetCDF4-Python can read a remote OPeNDAP dataset or a local NetCDF file:
url='http://thredds.ucar.edu/thredds/dodsC/grib/NCEP/WW3/Global/Best'
nc = netCDF4.Dataset(url)
nc.variables.keys()

lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:],time_var.units)

# determine what longitude convention is being used [-180,180], [0,360]
print lon.min(),lon.max()

# specify some location to extract time series
lati = 41.4; loni = -67.8 +360.0  # Georges Bank

# find closest index to specified value
def near(array,value):
    idx=(abs(array-value)).argmin()
    return idx

# Find nearest point to desired location (could also interpolate, but more work)
ix = near(lon, loni)
iy = near(lat, lati)

# Extract desired times.      
# 1. Select -+some days around the current time:
start = dt.datetime.utcnow()- dt.timedelta(days=3)
stop = dt.datetime.utcnow()+ dt.timedelta(days=3)
#       OR
# 2. Specify the exact time period you want:
#start = dt.datetime(2013,6,2,0,0,0)
#stop = dt.datetime(2013,6,3,0,0,0)

istart = netCDF4.date2index(start,time_var,select='nearest')
istop = netCDF4.date2index(stop,time_var,select='nearest')
print istart,istop

# Get all time records of variable [vname] at indices [iy,ix]
vname = 'Significant_height_of_wind_waves_surface'
#vname = 'surf_el'
var = nc.variables[vname]
hs = var[istart:istop,iy,ix]
tim = dtime[istart:istop]

# Create Pandas time series object
ts = pd.Series(hs,index=tim,name=vname)

# Use Pandas time series plot method
ts.plot(figsize(12,4),
   title='Location: Lon=%.2f, Lat=%.2f' % ( lon[ix], lat[iy]),legend=True)
plt.ylabel(var.units);

#write to a CSV file
ts.to_csv('time_series_from_netcdf.csv')

它们都会创建此图以验证您是否拥有所需的数据： enter image description here

并将所需的CSV文件time_series_from_netcdf.csv写入磁盘。

您也可以view, download and/or run this example on Wakari。

Answer 2

Rich Signell's answer非常有帮助！就像注意一样，导入日期时间也很重要，在提取时间时，使用以下代码是必要的：

import datetime
import netCDF4
import pandas as pd
import matplotlib.pyplot as plt

...

# 2. Specify the exact time period you want:
start = datetime.datetime(2005,1,1,0,0,0)
stop = datetime.datetime(2010,12,31,0,0,0)

然后我循环遍历我的数据集所需的所有区域。

Answer 3

不确定你还有什么问题，看起来不错。我确实看到了：

# convert date, how to store date only strip away time?
 print "Converting Dates"
 units = nc.variables['time'].units
 dates = num2date (times[:], units=units, calendar='365_day')

您现在将日期作为python datetime对象

 #print [dates.strftime('%Y%m%d%H') for date in dates]

如果您想要它们作为字符串，这就是您所需要的 - 但如果您只想要这一天，请删除％H：

date_strings = [dates.strftime（'％Y％m％d'）日期中的日期]

如果您希望将年，月日作为数字，则datetime对象具有以下属性：

dt.year，dt.month，dt.day

至于你的sfc变量 - 是一个三维数组，所以为了得到一个特定的值，你可以这样做：

sfc [time_index，lat_index，lon_index]

3-d有不止一种方法可以将它写入csv文件，但我猜你可能想要这样的东西：

表示time_index，枚举时间（时间）：＃拉出那段时间的数据 data = sfc [time_index，：，] ＃将日期写入文件（可能）＃.... 现在循环“行” 对于数据中的行： outputwriter.writerow（[str（val）for val in row]）

或类似的......

Answer 4

属性错误的问题是因为content需要是一个列表，并使用lat初始化它，这只是一个数字。你需要将它填入列表中。

关于3D变量，lats = nc.variables['latitude'][:]足以读取所有数据。

更新：一起迭代lon / lat

这是您的代码，其中列表的mod为和迭代：

# the csv file is closed when you leave the block
with open('output.csv', 'wb') as csvFile:
    outputwriter = csv.writer(csvFile, delimiter=',')
    for time_index, time in enumerate(times): # pull the dates out for the header
        t = num2date(time, units = units, calendar='365_day')
        header.append(t)
    outputwriter.writerow(header)

    for latlon_index, (lat,lon) in enumerate(zip(lats, lons)):
        content = [lat, lon] # Put lat and lon into list
        print latlon_index
        for time_index, time in enumerate(times): # for a date
            # pull out the data 
            data = sfc[time_index,lat_index,lon_index]
            content.append(data)
            outputwriter.writerow(content)``

我实际上并没有尝试过这样做，因此潜伏着其他问题。

如何使用Python读取NetCDF文件并写入CSV

4 个答案: