Question

我正在尝试从多时间文件（netCDF）中提取特定位置和时间的像素值。

每个文件的名称分别为：T2011，T2012，依此类推，直到T2017。每个文件包含365个图层，每个图层对应一年中的某一天，该图层表示该天的温度。

我的目标是根据我的输入数据集提取信息。我的目标有一个csv（locd.csv），看起来像这样：

id      lat         lon     DateFin    DateCount
1   46.63174271 7.405986324 02-02-18    43,131
2   46.64972969 7.484352537 25-01-18    43,123
3   47.27028727 7.603811832 20-01-18    43,118
4   46.99994455 7.063905466 05-02-18    43,134
5   47.08125481 7.19501811  20-01-18    43,118
6   47.37833814 7.432005368 11-12-18    43,443
7   47.43230354 7.445253182 30-12-18    43,462
8   46.73777711 6.777871255 09-04-18    43,197
69  47.42285191 7.113934735 09-04-18    43,197

id是我感兴趣的位置，lat和lon：纬度和经度），DateFin对应于我想知道温度的日期在该特定位置，而DateCount对应于数字从01-01-1900到我感兴趣的日期的天数（这就是在文件中对图层进行索引的方式）。

为此，我有如下内容：

import glob
from netCDF4 import Dataset 
import pandas as pd
import numpy as np
from datetime import date
import os 

# Record all the years of the netCDF files into a Python list
all_years = []

for file in glob.glob('*.nc'):
    print(file)
    data = Dataset(file, 'r')
    time = data.variables['time'] # that's how the days are stored
    year = file[0:4]
    all_years.append(year)


# define my input data
cities = pd.read_csv('locd.csv')

# extracting the data 
for index, row in cities.iterrows():
    id_row = row['id'] # id from the database
    location_latitude = row['lat']
    location_longitude = row['lon']
    location_date = row['DateCount'] #number of day counting since 1900-01-01

    # Sorting the all_years python list
    all_years.sort()

    for yr in all_years:
        # Reading-in the data 
        data = Dataset(str(yr)+'.nc', 'r')

        # Storing the lat and lon data of the netCDF file into variables 
        lat = data.variables['lat'][:]
        lon = data.variables['lon'][:]

        # Squared difference between the specified lat,lon and the lat,lon of the netCDF 
        sq_diff_lat = (lat - location_latitude)**2 
        sq_diff_lon = (lon - location_longitude)**2

        # Identify the index of the min value for lat and lon
        min_index_lat = sq_diff_lat.argmin()
        min_index_lon = sq_diff_lon.argmin()

        # Accessing the precipitation data
        prec= data.variables['precipi'] # that's how the variable is called 


        for p_index in np.arange(0, len(location_date)):
            print('Recording the value for '+ id_row+': ' + str(location_date[p_index]))
            df.loc[id_row[location_date]]['Precipitation'] = prec[location_date, min_index_lat, min_index_lon]

    # to record it in a new archive
    df.to_csv('locationNew.csv')

我的问题：

我无法使其正常运行。每次有新事物出现时，现在都说"id_row" must be a string。

有人使用这些类型的文件有提示或经验吗？

在特定日期从多个位置的多个NetCDF文件中提取数据

0 个答案: