我有两个CSV文件 第一个,当被视为列表时,看起来像这样:
('Rubus idaeus', '10.0', '56.0')
('Neckera crispa', '9.8785', '56.803')
('Dicranum polysetum', '9.1919', '56.0456')
('Sphagnum subnitens', '9.1826', '56.6367')
('Taxus baccata', '9.61778', '55.68833')
('Sphagnum papillosum', '9.1879', '56.0442')
列是'Species','Longitude'和'Latitude'。
它们是在现场进行的观察。
另一个文件也是CSV文件。测试类似于真实的东西。它看起来像这样:
{'y': '58.1', 'x': '22.1', 'temp': '14'}
{'y': '58.2', 'x': '22.2', 'temp': '10'}
{'y': '58.3', 'x': '22.3', 'temp': '1'}
{'y': '58.4', 'x': '22.4', 'temp': '12'}
{'y': '58.5', 'x': '22.5', 'temp': '1'}
{'y': '58.6', 'x': '22.6', 'temp': '6'}
{'y': '58.7', 'x': '22.7', 'temp': '0'}
{'y': '58.8', 'x': '22.8', 'temp': '13'}
{'y': '58.9', 'x': '22.9', 'temp': '7'}
这两个文件确实很长。
我有观察结果,现在我想在包含气候数据的文件中找到最接近的较低数字,然后将该行附加到另一个,因此输出变为:
('Dicranum polysetum', '9.1919', '56.0456', 'y': '9.1', 'x': '56.0', 'temp': '7')
我尝试使用DictReader
迭代CSV文件来创建嵌套循环,但它非常快速地嵌套。并且需要大量的循环才能完成整个过程
有人知道方法吗?
我目前的代码很差,但我尝试了几种方式循环,我希望我的整个方法存在根本性的错误。
import csv
fil = csv.DictReader(open("TestData.csv"), delimiter=';')
navn = "nyDK_OVER_50M.csv"
occu = csv.DictReader(open(navn), delimiter='\t')
for row in fil:
print 'x=',row['x']
for line in occu:
print round(float(line['decimalLongitude']),1)
if round(float(line['decimalLongitude']),1) == row['x']:
print 'You did it, found one dam match'
这是我的两个文件的链接,因此如果您知道可以推动我前进的事情,您不必编制任何数据。
https://www.dropbox.com/s/lmstnkq8jl71vcc/nyDK_OVER_50M.csv?dl=0 https://www.dropbox.com/s/v22j61vi9b43j78/TestData.csv?dl=0
祝你好运, 的Mathias
答案 0 :(得分:1)
因为您说没有缺少温度数据点,所以解决问题要容易得多:
import csv
# temperatures
fil = csv.DictReader(open("TestData.csv"), delimiter=';')
# species
navn = "nyDK_OVER_50M.csv"
occu = csv.DictReader(open(navn), delimiter='\t')
d = {}
for row in fil:
x = '{:.1f}'.format(float(row['x']))
y = '{:.1f}'.format(float(row['y']))
try:
d[x][y] = row['temp']
except KeyError:
d[x] = {y:row['temp']}
for line in occu:
x = '{:.1f}'.format(round(float(line['decimalLongitude']),1))
y = '{:.1f}'.format(round(float(line['decimalLatitude']),1))
temp = d[x][y]
line['temp'] = temp
line['x'] = x
line['y'] = y
print(line)
答案 1 :(得分:1)
这是一个解决方案,它使用numpy
计算每个数据项与x,y
点的欧几里德距离,并使用最小的x,y
数据元组中的数据连接项目距离它。
import numpy
import operator
# read the data into numpy arrays
testdata = numpy.genfromtxt('TestData.csv', delimiter=';', names=True)
nyDK = numpy.genfromtxt('nyDK_OVER_50M.csv', names=True, delimiter='\t',\
dtype=[('species','|S64'),\
('decimalLongitude','float32'),\
('decimalLatitude','float32')])
# extract the x,y tuples into a numpy array or [(lat,lon), ...]
xy = numpy.array(map(operator.itemgetter('x', 'y'), testdata))
# this is a function which returns a function which computes the distance
# from an arbitrary point to an origin
distance = lambda origin: lambda point: numpy.linalg.norm(point-origin)
# methods to extract the (lat, lon) from a nyDK entry
latlon = operator.itemgetter('decimalLatitude', 'decimalLongitude')
getlatlon = lambda item: numpy.array(latlon(item))
# this will transfrom a single element of the nyDK array into
# a union of it with its closest climate data
def transform(item):
# compute distance from each x,y point to this item's location
# and find the position of the minimum
idx = numpy.argmin( map(distance(getlatlon(item)), xy) )
# return the union of the item and the closest climate data
return tuple(list(item)+list(testdata[idx]))
# transform all the entries in the input data set
result = map(transform, nyDK)
print result[0:3]
输出:
[('Rubus idaeus', 10.0, 56.0, 15.0, 51.0, 14.0),
('Neckera crispa', 9.8785, 56.803001, 15.300000000000001, 51.299999999999997, 2.0),
('Dicranum polysetum', 9.1919003, 56.045601, 14.6, 50.600000000000001, 10.0)]
注意:距离不是很近,但可能是因为x,y
文件中没有.csv
点的完整网格。