如何使用Python读取NetCDF文件并写入CSV

时间:2015-02-09 22:50:26

标签: python netcdf

我的目标是从netcdf文件访问数据并以下列格式写入CSV文件。

Latitude  Longitude Date1  Date2  Date3
100       200       <-- MIN_SFC values -->

到目前为止,我已经访问了变量,将标题写入文件并填充了lat / lons。

如何访问指定的lon,lat坐标和日期的MIN_SFC值,然后写入CSV文件。

我是一个蟒蛇新手,如果有更好的方法可以解决这个问题,请告诉我。

NetCDF文件信息:

Dimensions:
  time = 7 
  latitude = 292
  longitude =341

Variables:
  float MIN_SFC (time=7, latitude = 292, longitude = 341)

这是我尝试过的:

 from netCDF4 import Dataset, num2date

 filename = "C:/filename.nc"

 nc = Dataset(filename, 'r', Format='NETCDF4')
 print nc.variables

 print 'Variable List'

 for var in nc.variables:
    print var, var.units, var.shape

 # get coordinates variables
 lats = nc.variables['latitude'][:]
 lons = nc.variables['longitude'][:]

 sfc= nc.variables['Min_SFC'][:]
 times = nc.variables['time'][:]

 # convert date, how to store date only strip away time?
 print "Converting Dates"
 units = nc.variables['time'].units
 dates = num2date (times[:], units=units, calendar='365_day')

 #print [dates.strftime('%Y%m%d%H') for date in dates]

 header = ['Latitude', 'Longitude']

 # append dates to header string

 for d in dates:
    print d
    header.append(d)

 # write to file
 import csv

 with open('Output.csv', 'wb') as csvFile:
    outputwriter = csv.writer(csvFile, delimiter=',')
    outputwriter.writerow(header)
    for lat, lon in zip(lats, lons):
      outputwriter.writerow( [lat, lon] )
 
 # close the output file
 csvFile.close()

 # close netcdf
 nc.close()

更新

我已经更新了写入CSV文件的代码,有一个属性错误,因为lat / lon是双倍的。

AttributeError:'numpy.float32'对象没有属性'append'

在python中转换为字符串的任何方法?你认为它会起作用吗?

当我向控制台打印值时,我注意到许多值返回为“ - ”。我想知道这是否代表fillValue或missingValue定义为-32767.0。

我也想知道是否应该通过lats = nc.variables ['latitude'] [:] [:]或lats = nc.variables ['latitude'] [:] [来访问3d数据集的变量:,:]?

# the csv file is closed when you leave the block
with open('output.csv', 'wb') as csvFile:
    outputwriter = csv.writer(csvFile, delimiter=',')
    for time_index, time in enumerate(times): # pull the dates out for the header
         t = num2date(time, units = units, calendar='365_day')
         header.append(t)
    outputwriter.writerow(header)  
    for lat_index, lat in enumerate(lats):
        content = lat
        print lat_index
        for lon_index, lon in enumerate(lons):
            content.append(lon)
            print lon_index    
            for time_index, time in enumerate(times): # for a date
                # pull out the data 
                data = sfc[time_index,lat_index,lon_index]
                content.append(data)
                outputwriter.writerow(content)

4 个答案:

答案 0 :(得分:6)

我会将数据加载到Pandas中,这有助于分析和绘制时间序列数据,以及写入CSV。

因此,这是一个真实的工作示例,它从全局预测模型数据集中的指定lon,lat位置提取时间序列的波高。

注意:这里我们访问OPeNDAP数据集,这样我们就可以从远程服务器中提取所需的数据而无需下载文件。但netCDF4对于删除OPeNDAP数据集或本地NetCDF文件的工作方式完全相同,这是一个非常有用的功能!

import netCDF4
import pandas as pd
import matplotlib.pyplot as plt

# NetCDF4-Python can read a remote OPeNDAP dataset or a local NetCDF file:
url='http://thredds.ucar.edu/thredds/dodsC/grib/NCEP/WW3/Global/Best'
nc = netCDF4.Dataset(url)
nc.variables.keys()

lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:],time_var.units)

# determine what longitude convention is being used [-180,180], [0,360]
print lon.min(),lon.max()

# specify some location to extract time series
lati = 41.4; loni = -67.8 +360.0  # Georges Bank

# find closest index to specified value
def near(array,value):
    idx=(abs(array-value)).argmin()
    return idx

# Find nearest point to desired location (could also interpolate, but more work)
ix = near(lon, loni)
iy = near(lat, lati)

# Extract desired times.      
# 1. Select -+some days around the current time:
start = dt.datetime.utcnow()- dt.timedelta(days=3)
stop = dt.datetime.utcnow()+ dt.timedelta(days=3)
#       OR
# 2. Specify the exact time period you want:
#start = dt.datetime(2013,6,2,0,0,0)
#stop = dt.datetime(2013,6,3,0,0,0)

istart = netCDF4.date2index(start,time_var,select='nearest')
istop = netCDF4.date2index(stop,time_var,select='nearest')
print istart,istop

# Get all time records of variable [vname] at indices [iy,ix]
vname = 'Significant_height_of_wind_waves_surface'
#vname = 'surf_el'
var = nc.variables[vname]
hs = var[istart:istop,iy,ix]
tim = dtime[istart:istop]

# Create Pandas time series object
ts = pd.Series(hs,index=tim,name=vname)

# Use Pandas time series plot method
ts.plot(figsize(12,4),
   title='Location: Lon=%.2f, Lat=%.2f' % ( lon[ix], lat[iy]),legend=True)
plt.ylabel(var.units);

#write to a CSV file
ts.to_csv('time_series_from_netcdf.csv')

它们都会创建此图以验证您是否拥有所需的数据: enter image description here

并将所需的CSV文件time_series_from_netcdf.csv写入磁盘。

您也可以view, download and/or run this example on Wakari

答案 1 :(得分:1)

Rich Signell's answer非常有帮助!就像注意一样,导入日期时间也很重要,在提取时间时,使用以下代码是必要的:

import datetime
import netCDF4
import pandas as pd
import matplotlib.pyplot as plt

...

# 2. Specify the exact time period you want:
start = datetime.datetime(2005,1,1,0,0,0)
stop = datetime.datetime(2010,12,31,0,0,0)

然后我循环遍历我的数据集所需的所有区域。

答案 2 :(得分:0)

不确定你还有什么问题,看起来不错。我确实看到了:

# convert date, how to store date only strip away time?
 print "Converting Dates"
 units = nc.variables['time'].units
 dates = num2date (times[:], units=units, calendar='365_day')

您现在将日期作为python datetime对象

 #print [dates.strftime('%Y%m%d%H') for date in dates]

如果您想要它们作为字符串,这就是您所需要的 - 但如果您只想要这一天,请删除%H:

date_strings = [dates.strftime('%Y%m%d')日期中的日期]

如果您希望将年,月日作为数字,则datetime对象具有以下属性:

dt.year,dt.month,dt.day

至于你的sfc变量 - 是一个三维数组,所以为了得到一个特定的值,你可以这样做:

sfc [time_index,lat_index,lon_index]

3-d有不止一种方法可以将它写入csv文件,但我猜你可能想要这样的东西:

表示time_index,枚举时间(时间):     #拉出那段时间的数据     data = sfc [time_index,:,]     #将日期写入文件(可能)     #....     现在循环“行”     对于数据中的行:         outputwriter.writerow([str(val)for val in row])

或类似的......

答案 3 :(得分:0)

属性错误的问题是因为content需要是一个列表,并使用lat初始化它,这只是一个数字。你需要将它填入列表中。

关于3D变量,lats = nc.variables['latitude'][:]足以读取所有数据。

更新:一起迭代lon / lat

这是您的代码,其中列表的mod为和迭代

# the csv file is closed when you leave the block
with open('output.csv', 'wb') as csvFile:
    outputwriter = csv.writer(csvFile, delimiter=',')
    for time_index, time in enumerate(times): # pull the dates out for the header
        t = num2date(time, units = units, calendar='365_day')
        header.append(t)
    outputwriter.writerow(header)

    for latlon_index, (lat,lon) in enumerate(zip(lats, lons)):
        content = [lat, lon] # Put lat and lon into list
        print latlon_index
        for time_index, time in enumerate(times): # for a date
            # pull out the data 
            data = sfc[time_index,lat_index,lon_index]
            content.append(data)
            outputwriter.writerow(content)``

我实际上并没有尝试过这样做,因此潜伏着其他问题。