迭代多个数据帧pandas

时间:2017-04-16 09:15:33

标签: python pandas dataframe coordinates

我有两个数据帧: 1)包含供应商列表及其Lat,Long坐标

sup_essential = pd.DataFrame({'supplier': ['A','B','C'],
                              'coords': [(51.1235,-0.3453),(52.1245,-0.3423),(53.1235,-1.4553)]})

2)商店列表及其纬度,长坐标

stores_essential = pd.DataFrame({'storekey': [1,2,3],
                              'coords': [(54.1235,-0.6553),(49.1245,-1.3423),(50.1235,-1.8553)]})

我想创建一个输出表,其中包含:store,store_coordinates,supplier,supplier_coordinates,每个商店和供应商组合的距离。

我目前有:

test=[]
for row in sup_essential.iterrows():
    for row in stores_essential.iterrows():
        r = sup_essential['supplier'],stores_essential['storeKey']
        test.append(r)

但这只是重复了所有的值

1 个答案:

答案 0 :(得分:0)

来源DF

from sklearn.neighbors import DistanceMetric
dist = DistanceMetric.get_metric('haversine')

m = pd.merge(sup.assign(x=0), stores.assign(x=0), on='x', suffixes=['1','2']).drop('x',1)

d1 = sup[['coords']].assign(lat=sup.coords.str[0], lon=sup.coords.str[1]).drop('coords',1)
d2 = stores[['coords']].assign(lat=stores.coords.str[0], lon=stores.coords.str[1]).drop('coords',1)

m['dist_km'] = np.ravel(dist.pairwise(np.radians(d1), np.radians(d2)) * 6367)
## -- End pasted text --

<强>解决方案:

In [135]: m
Out[135]:
              coords1 supplier             coords2  storekey     dist_km
0  (51.1235, -0.3453)        A  (54.1235, -0.6553)         1  334.029670
1  (51.1235, -0.3453)        A  (49.1245, -1.3423)         2  233.213416
2  (51.1235, -0.3453)        A  (50.1235, -1.8553)         3  153.880680
3  (52.1245, -0.3423)        B  (54.1235, -0.6553)         1  223.116901
4  (52.1245, -0.3423)        B  (49.1245, -1.3423)         2  340.738587
5  (52.1245, -0.3423)        B  (50.1235, -1.8553)         3  246.116984
6  (53.1235, -1.4553)        C  (54.1235, -0.6553)         1  122.997130
7  (53.1235, -1.4553)        C  (49.1245, -1.3423)         2  444.459052
8  (53.1235, -1.4553)        C  (50.1235, -1.8553)         3  334.514028

<强>结果:

{{1}}