Question

我有两个CSV文件第一个，当被视为列表时，看起来像这样：

('Rubus idaeus', '10.0', '56.0')
('Neckera crispa', '9.8785', '56.803')
('Dicranum polysetum', '9.1919', '56.0456')
('Sphagnum subnitens', '9.1826', '56.6367')
('Taxus baccata', '9.61778', '55.68833')
('Sphagnum papillosum', '9.1879', '56.0442')

列是'Species'，'Longitude'和'Latitude'。它们是在现场进行的观察。
另一个文件也是CSV文件。测试类似于真实的东西。它看起来像这样：

{'y': '58.1', 'x': '22.1', 'temp': '14'}
{'y': '58.2', 'x': '22.2', 'temp': '10'}
{'y': '58.3', 'x': '22.3', 'temp': '1'}
{'y': '58.4', 'x': '22.4', 'temp': '12'}
{'y': '58.5', 'x': '22.5', 'temp': '1'}
{'y': '58.6', 'x': '22.6', 'temp': '6'}
{'y': '58.7', 'x': '22.7', 'temp': '0'}
{'y': '58.8', 'x': '22.8', 'temp': '13'}
{'y': '58.9', 'x': '22.9', 'temp': '7'}

这两个文件确实很长。

我有观察结果，现在我想在包含气候数据的文件中找到最接近的较低数字，然后将该行附加到另一个，因此输出变为：

('Dicranum polysetum', '9.1919', '56.0456', 'y': '9.1', 'x': '56.0', 'temp': '7')

我尝试使用DictReader迭代CSV文件来创建嵌套循环，但它非常快速地嵌套。并且需要大量的循环才能完成整个过程有人知道方法吗？

我目前的代码很差，但我尝试了几种方式循环，我希望我的整个方法存在根本性的错误。

import csv
fil = csv.DictReader(open("TestData.csv"), delimiter=';')
navn = "nyDK_OVER_50M.csv"
occu = csv.DictReader(open(navn), delimiter='\t')

for row in fil:
    print 'x=',row['x']
    for line in occu:
        print round(float(line['decimalLongitude']),1)
        if round(float(line['decimalLongitude']),1) == row['x']:
            print 'You did it, found one dam match'

这是我的两个文件的链接，因此如果您知道可以推动我前进的事情，您不必编制任何数据。

https://www.dropbox.com/s/lmstnkq8jl71vcc/nyDK_OVER_50M.csv?dl=0 https://www.dropbox.com/s/v22j61vi9b43j78/TestData.csv?dl=0

祝你好运，的Mathias

Answer 1

因为您说没有缺少温度数据点，所以解决问题要容易得多：

import csv

# temperatures
fil = csv.DictReader(open("TestData.csv"), delimiter=';')
# species
navn = "nyDK_OVER_50M.csv"
occu = csv.DictReader(open(navn), delimiter='\t')

d = {}
for row in fil:
    x = '{:.1f}'.format(float(row['x']))
    y = '{:.1f}'.format(float(row['y']))
    try:
        d[x][y] = row['temp']
    except KeyError:
        d[x] = {y:row['temp']}

for line in occu:
    x = '{:.1f}'.format(round(float(line['decimalLongitude']),1))
    y = '{:.1f}'.format(round(float(line['decimalLatitude']),1))
    temp = d[x][y]
    line['temp'] = temp
    line['x'] = x
    line['y'] = y
    print(line)

Answer 2

这是一个解决方案，它使用numpy计算每个数据项与x,y点的欧几里德距离，并使用最小的x,y数据元组中的数据连接项目距离它。

import numpy
import operator

# read the data into numpy arrays
testdata = numpy.genfromtxt('TestData.csv', delimiter=';', names=True)
nyDK     = numpy.genfromtxt('nyDK_OVER_50M.csv', names=True, delimiter='\t',\
                            dtype=[('species','|S64'),\
                                   ('decimalLongitude','float32'),\
                                   ('decimalLatitude','float32')])

# extract the x,y tuples into a numpy array or [(lat,lon), ...]
xy        = numpy.array(map(operator.itemgetter('x', 'y'), testdata))
# this is a function which returns a function which computes the distance
# from an arbitrary point to an origin
distance  = lambda origin: lambda point: numpy.linalg.norm(point-origin)

# methods to extract the (lat, lon) from a nyDK entry
latlon    = operator.itemgetter('decimalLatitude', 'decimalLongitude')
getlatlon = lambda item: numpy.array(latlon(item))

# this will transfrom a single element of the nyDK array into
# a union of it with its closest climate data
def transform(item):
    # compute distance from each x,y point to this item's location
    # and find the position of the minimum
    idx = numpy.argmin( map(distance(getlatlon(item)), xy) )
    # return the union of the item and the closest climate data
    return tuple(list(item)+list(testdata[idx]))

# transform all the entries in the input data set
result = map(transform, nyDK)

print result[0:3]

输出：

[('Rubus idaeus', 10.0, 56.0, 15.0, 51.0, 14.0),
 ('Neckera crispa', 9.8785, 56.803001, 15.300000000000001, 51.299999999999997, 2.0),
 ('Dicranum polysetum', 9.1919003, 56.045601, 14.6, 50.600000000000001, 10.0)]

注意：距离不是很近，但可能是因为x,y文件中没有.csv点的完整网格。

相应的浮动在两个列表中

2 个答案: