使用Pandas计算导入的csv中坐标之间的距离

时间:2015-10-09 03:25:14

标签: python csv numpy pandas

我正在尝试导入包含两列位置数据(lat / long)的.csv,计算点之间的距离,将距离写入新列,将函数循环到下一组坐标,然后写入输出数据帧到新的.csv。我编写了以下代码,

import pandas as pd
import numpy as np
pd.read_csv("input.csv")

def dist_from_coordinates(lat1, lon1, lat2, lon2):
R = 6371  # Earth radius in km

#conversion to radians
d_lat = np.radians(lat2-lat1)
d_lon = np.radians(lon2-lon1)

r_lat1 = np.radians(lat1)
r_lat2 = np.radians(lat2)

#haversine formula
a = np.sin(d_lat/2.) **2 + np.cos(r_lat1) * np.cos(r_lat2) * np.sin(d_lon/2.)**2

haversine = 2 * R * np.arcsin(np.sqrt(a))

return haversine

lat1 = row['lat1'] #first row of location.lat column here
lon1 = row['lon1'] #first row of location.long column here
lat2 = row['lat2'] #second row of location.lat column here
lon2 = row['lon2'] #second row of location.long column here

print(dist_from_coordinates(lat1, lon1, lat2, lon2), 'km')

df.to_csv('output.csv')

我收到以下错误:Traceback(最近一次调用最后一次):   文件“Test.py”,第22行,in     lat1 = row ['lat1']这里的第一行location.lat列 NameError:名称'row'未定义

是否可以提供有关如何通过此数据成功循环此公式的其他反馈?

1 个答案:

答案 0 :(得分:2)

我假设您在input.csv中使用了4列,其中包含lat1,lon1,lat2和lon2的值。因此,在完成操作之后,output.csv文件是一个单独的文件,其中包含所有前4列以及第5列,即距离。您可以使用for循环来执行此操作。我在这里显示的方法读取每一行并计算距离并将其附加到空列表中,该列表是新列“距离”并最终创建output.csv。在必要时进行更改。 请记住,这适用于具有多个坐标值的4列csv文件。希望这对你有所帮助。祝你有美好的一天。

import pandas as pd
import numpy as np
input_file = "input.csv"
output_file = "output.csv"
df = pd.read_csv(input_file)                       #Dataframe specification
df = df.convert_objects(convert_numeric = True)

def dist_from_coordinates(lat1, lon1, lat2, lon2):
  R = 6371  # Earth radius in km

  #conversion to radians
  d_lat = np.radians(lat2-lat1)
  d_lon = np.radians(lon2-lon1)

  r_lat1 = np.radians(lat1)
  r_lat2 = np.radians(lat2)

  #haversine formula
  a = np.sin(d_lat/2.) **2 + np.cos(r_lat1) * np.cos(r_lat2) * np.sin(d_lon/2.)**2

  haversine = 2 * R * np.arcsin(np.sqrt(a))

  return haversine

new_column = []                    #empty column for distance
for index,row in df.iterrows():
  lat1 = row['lat1'] #first row of location.lat column here
  lon1 = row['lon1'] #first row of location.long column here
  lat2 = row['lat2'] #second row of location.lat column here
  lon2 = row['lon2'] #second row of location.long column here
  value = dist_from_coordinates(lat1, lon1, lat2, lon2)  #get the distance
  new_column.append(value)   #append the empty list with distance values

df.insert(4,"Distance",new_column)  #4 is the index where you want to place your column. Column index starts with 0. "Distance" is the header and new_column are the values in the column.

with open(output_file,'ab') as f:
  df.to_csv(f,index = False)       #creates the output.csv