Python熊猫中的距离矩阵

时间:2019-06-20 06:04:45

标签: python pandas csv matrix

我是python的新手,但我喜欢在熊猫中处理数据。

我有一百对CSV数据,例如乘客和公共汽车站数据。乘客结构数据是“人”和XY坐标(UTM-Meter)。公交车站数据结构是公交车站名称(BS)和XY坐标(UTM-Meter。 我的数据如下:

df(person)
Person  X           Y
A_first 260357.3199 4064458.685
A_last  261618.5243 4064789.928
B_first 258270.5926 4063717.789
B_last  258270.5926 4063717.789
C_first 259051.758  4064462.021
C_last  258270.5926 4063717.789
D_first 260764.3916 4064624.977
D_last  260753.9053 4064569.745
E_first 258270.5926 4063717.789
E_last  258270.5926 4063717.789
F_first 258262.2825 4063740.234
F_last  258270.5926 4063717.789
G_first 258326.2393 4064104.899
G_last  258270.5926 4063717.789
H_first 259491.5911 4064838.328
H_last  260747.1408 4064647.669

df(bus stop)
BS  X           Y 
BS1 258280.8228 4063715.835
BS2 259961.3734 4064840.298
BS3 260360.0219 4064468.593
BS4 260530.5126 4064683.101
BS5 260806.7009 4064399.184
BS6 261178.4165 4063605.91
BS7 261222.1745 4064495.158
BS8 261627.6173 4064281.38
BS9 261662.8833 4064793.444

我想分析人与公交车站数据之间的distance matrix。我的期望输出是InputID(人),TargetID(公共汽车站)和距离。距离是人员点数据与公交车站数据之间的距离。我的预期结果:

result
InputID (Person)    TargetID (BS)   Distance (meter)
A_first                 BS1                 xx
A_first                 BS2                 xx
A_first                 BS3                 xx
A_first                 BS4                 xx
A_first                 BS5                 xx
A_first                 BS6                 xx
A_first                 BS7                 xx
A_first                 BS8                 xx
A_first                 BS9                 xx
A_last                  BS1                 xx
A_last                  BS2                 xx
A_last                  BS3                 xx
A_last                  BS4                 xx
A_last                  BS5                 xx
A_last                  BS6                 xx
A_last                  BS7                 xx
A_last                  BS8                 xx
A_last                  BS9                 xx
............. so on

对我有什么建议吗?谢谢

2 个答案:

答案 0 :(得分:1)

  1. 首先,您需要创建一个数据框,该数据框是两个数据框的 cartestian积。这意味着对于每个人,每个公共汽车站都有一行,就像您写的那样。例如,您可以查看如何使用Python here

  2. 现在,在该新数据帧上,您需要计算两对坐标(人员坐标和公交车站坐标)之间的每一行的距离。为此,您可以使用具有GeoPandas方法的distance

答案 1 :(得分:1)

您可以首先使用

创建外部联接
import pandas as pd
import numpy as np

df1 = pd.DataFrame({
    'Person': ['A_first', 'A_last', 'B_first', 'B_last'],
    'X': [260357.3199, 261618.5243, 258270.5926, 258270.5926],
    'Y': [4064458.685, 4064789.928, 4063717.789, 4063717.789],
})

df2 = pd.DataFrame({
    'BS': ['BS1', 'BS2'],
    'X': [258280.8228, 259961.3734],
    'Y': [4063715.835, 4064840.298],
})


df1['key'] = 0
df2['key'] = 0

df_cartesian = df1.merge(df2, on='key').drop(columns=['key'])

然后计算按行的X和Y差,并采用L2范数:

df_cartesian['X_diff'] = df_cartesian['X_x'] - df_cartesian['X_y']
df_cartesian['Y_diff'] = df_cartesian['Y_x'] - df_cartesian['Y_y']

df_cartesian['dist'] = np.linalg.norm(df_cartesian[['X_diff', 'Y_diff']], axis=1)

df_cartesian
#     Person          X_x          Y_x   BS          X_y          Y_y  \
# 0  A_first  260357.3199  4064458.685  BS1  258280.8228  4063715.835   
# 1  A_first  260357.3199  4064458.685  BS2  259961.3734  4064840.298   
# 2   A_last  261618.5243  4064789.928  BS1  258280.8228  4063715.835   
# 3   A_last  261618.5243  4064789.928  BS2  259961.3734  4064840.298   
# 4  B_first  258270.5926  4063717.789  BS1  258280.8228  4063715.835   
# 5  B_first  258270.5926  4063717.789  BS2  259961.3734  4064840.298   
# 6   B_last  258270.5926  4063717.789  BS1  258280.8228  4063715.835   
# 7   B_last  258270.5926  4063717.789  BS2  259961.3734  4064840.298   
# 
#       X_diff    Y_diff         dist  
# 0  2076.4971   742.850  2205.372152  
# 1   395.9465  -381.613   549.911004  
# 2  3337.7015  1074.093  3506.269681  
# 3  1657.1509   -50.370  1657.916235  
# 4   -10.2302     1.954    10.415138  
# 5 -1690.7808 -1122.509  2029.474358  
# 6   -10.2302     1.954    10.415138  
# 7 -1690.7808 -1122.509  2029.474358