我是python的新手,但我喜欢在熊猫中处理数据。
我有一百对CSV数据,例如乘客和公共汽车站数据。乘客结构数据是“人”和XY坐标(UTM-Meter)。公交车站数据结构是公交车站名称(BS)和XY坐标(UTM-Meter。 我的数据如下:
df(person)
Person X Y
A_first 260357.3199 4064458.685
A_last 261618.5243 4064789.928
B_first 258270.5926 4063717.789
B_last 258270.5926 4063717.789
C_first 259051.758 4064462.021
C_last 258270.5926 4063717.789
D_first 260764.3916 4064624.977
D_last 260753.9053 4064569.745
E_first 258270.5926 4063717.789
E_last 258270.5926 4063717.789
F_first 258262.2825 4063740.234
F_last 258270.5926 4063717.789
G_first 258326.2393 4064104.899
G_last 258270.5926 4063717.789
H_first 259491.5911 4064838.328
H_last 260747.1408 4064647.669
和
df(bus stop)
BS X Y
BS1 258280.8228 4063715.835
BS2 259961.3734 4064840.298
BS3 260360.0219 4064468.593
BS4 260530.5126 4064683.101
BS5 260806.7009 4064399.184
BS6 261178.4165 4063605.91
BS7 261222.1745 4064495.158
BS8 261627.6173 4064281.38
BS9 261662.8833 4064793.444
我想分析人与公交车站数据之间的distance matrix
。我的期望输出是InputID(人),TargetID(公共汽车站)和距离。距离是人员点数据与公交车站数据之间的距离。我的预期结果:
result
InputID (Person) TargetID (BS) Distance (meter)
A_first BS1 xx
A_first BS2 xx
A_first BS3 xx
A_first BS4 xx
A_first BS5 xx
A_first BS6 xx
A_first BS7 xx
A_first BS8 xx
A_first BS9 xx
A_last BS1 xx
A_last BS2 xx
A_last BS3 xx
A_last BS4 xx
A_last BS5 xx
A_last BS6 xx
A_last BS7 xx
A_last BS8 xx
A_last BS9 xx
............. so on
对我有什么建议吗?谢谢
答案 0 :(得分:1)
答案 1 :(得分:1)
您可以首先使用
创建外部联接import pandas as pd
import numpy as np
df1 = pd.DataFrame({
'Person': ['A_first', 'A_last', 'B_first', 'B_last'],
'X': [260357.3199, 261618.5243, 258270.5926, 258270.5926],
'Y': [4064458.685, 4064789.928, 4063717.789, 4063717.789],
})
df2 = pd.DataFrame({
'BS': ['BS1', 'BS2'],
'X': [258280.8228, 259961.3734],
'Y': [4063715.835, 4064840.298],
})
df1['key'] = 0
df2['key'] = 0
df_cartesian = df1.merge(df2, on='key').drop(columns=['key'])
然后计算按行的X和Y差,并采用L2范数:
df_cartesian['X_diff'] = df_cartesian['X_x'] - df_cartesian['X_y']
df_cartesian['Y_diff'] = df_cartesian['Y_x'] - df_cartesian['Y_y']
df_cartesian['dist'] = np.linalg.norm(df_cartesian[['X_diff', 'Y_diff']], axis=1)
df_cartesian
# Person X_x Y_x BS X_y Y_y \
# 0 A_first 260357.3199 4064458.685 BS1 258280.8228 4063715.835
# 1 A_first 260357.3199 4064458.685 BS2 259961.3734 4064840.298
# 2 A_last 261618.5243 4064789.928 BS1 258280.8228 4063715.835
# 3 A_last 261618.5243 4064789.928 BS2 259961.3734 4064840.298
# 4 B_first 258270.5926 4063717.789 BS1 258280.8228 4063715.835
# 5 B_first 258270.5926 4063717.789 BS2 259961.3734 4064840.298
# 6 B_last 258270.5926 4063717.789 BS1 258280.8228 4063715.835
# 7 B_last 258270.5926 4063717.789 BS2 259961.3734 4064840.298
#
# X_diff Y_diff dist
# 0 2076.4971 742.850 2205.372152
# 1 395.9465 -381.613 549.911004
# 2 3337.7015 1074.093 3506.269681
# 3 1657.1509 -50.370 1657.916235
# 4 -10.2302 1.954 10.415138
# 5 -1690.7808 -1122.509 2029.474358
# 6 -10.2302 1.954 10.415138
# 7 -1690.7808 -1122.509 2029.474358