我正在尝试从多时间文件(netCDF)中提取特定位置和时间的像素值。
每个文件的名称分别为:T2011
,T2012
,依此类推,直到T2017
。
每个文件包含365个图层,每个图层对应一年中的某一天,该图层表示该天的温度。
我的目标是根据我的输入数据集提取信息。
我的目标有一个csv(locd.csv
),看起来像这样:
id lat lon DateFin DateCount
1 46.63174271 7.405986324 02-02-18 43,131
2 46.64972969 7.484352537 25-01-18 43,123
3 47.27028727 7.603811832 20-01-18 43,118
4 46.99994455 7.063905466 05-02-18 43,134
5 47.08125481 7.19501811 20-01-18 43,118
6 47.37833814 7.432005368 11-12-18 43,443
7 47.43230354 7.445253182 30-12-18 43,462
8 46.73777711 6.777871255 09-04-18 43,197
69 47.42285191 7.113934735 09-04-18 43,197
id
是我感兴趣的位置,lat
和lon
:纬度和经度),DateFin
对应于我想知道温度的日期在该特定位置,而DateCount
对应于数字
从01-01-1900
到我感兴趣的日期的天数(这就是在文件中对图层进行索引的方式)。
为此,我有如下内容:
import glob
from netCDF4 import Dataset
import pandas as pd
import numpy as np
from datetime import date
import os
# Record all the years of the netCDF files into a Python list
all_years = []
for file in glob.glob('*.nc'):
print(file)
data = Dataset(file, 'r')
time = data.variables['time'] # that's how the days are stored
year = file[0:4]
all_years.append(year)
# define my input data
cities = pd.read_csv('locd.csv')
# extracting the data
for index, row in cities.iterrows():
id_row = row['id'] # id from the database
location_latitude = row['lat']
location_longitude = row['lon']
location_date = row['DateCount'] #number of day counting since 1900-01-01
# Sorting the all_years python list
all_years.sort()
for yr in all_years:
# Reading-in the data
data = Dataset(str(yr)+'.nc', 'r')
# Storing the lat and lon data of the netCDF file into variables
lat = data.variables['lat'][:]
lon = data.variables['lon'][:]
# Squared difference between the specified lat,lon and the lat,lon of the netCDF
sq_diff_lat = (lat - location_latitude)**2
sq_diff_lon = (lon - location_longitude)**2
# Identify the index of the min value for lat and lon
min_index_lat = sq_diff_lat.argmin()
min_index_lon = sq_diff_lon.argmin()
# Accessing the precipitation data
prec= data.variables['precipi'] # that's how the variable is called
for p_index in np.arange(0, len(location_date)):
print('Recording the value for '+ id_row+': ' + str(location_date[p_index]))
df.loc[id_row[location_date]]['Precipitation'] = prec[location_date, min_index_lat, min_index_lon]
# to record it in a new archive
df.to_csv('locationNew.csv')
我的问题:
我无法使其正常运行。每次有新事物出现时,现在都说"id_row" must be a string
。
有人使用这些类型的文件有提示或经验吗?