使用熊猫作为距离矩阵,然后获取相关距离的子数据框

时间:2019-05-21 18:35:12

标签: python python-3.x pandas

我创建了一个熊猫df,它在位置i和位置j之间具有距离。从起点P1和终点P2开始,我想找到一个子数据帧(距离矩阵),该子数据帧的df的一个轴具有P1,P2,另一轴具有其余索引。

我正在使用Pandas DF,因为我认为这是最有效的方法

dm_dict = # distance matrix in dict form where you can call dm_dict[i][j] and get the distance from i to j
dm_df = pd.DataFrame().from_dict(dm_dict)
P1 = dm_df.max(axis=0).idxmax()
P2 = dm_df[i].idxmax()
route = [i, j]
remaining_locs = dm_df[dm_df[~dm_df.isin(route)].isin(route)]
while not_done:
    # go through the remaining_locs until found all the locations are added.

没有错误消息,但其余的df充满了nan而不是带有距离的df。

使用dm_df[~dm_df.isin(route)].isin(route)似乎给了我一个准确的布尔df。


样本数据,从技术上讲,它是haversine距离,但欧几里得应该适合填充矩阵:

import numpy

def dist(i, j):
    a = numpy.array((i[1], i[2]))
    b = numpy.array((j[1], j[2]))
    return numpy.linalg.norm(a-b)

locations = [
    ("Ottawa", 45.424722,-75.695),
    ("Edmonton", 53.533333,-113.5),
    ("Victoria", 48.428611,-123.365556), 
    ("Winnipeg", 49.899444,-97.139167), 
    ("Fredericton",  49.899444,-97.139167), 
    ("StJohns", 47.561389, -52.7125),
    ("Halifax", 44.647778, -63.571389), 
    ("Toronto", 43.741667, -79.373333),
    ("Charlottetown",46.238889, -63.129167),
    ("QuebecCity",46.816667, -71.216667 ),
    ("Regina", 50.454722, -104.606667),
    ("Yellowknife", 62.442222, -114.3975),
    ("Iqaluit", 63.748611, -68.519722)
]

dm_dict = {i: {j: dist(i, j) for j in locations if j != i} for i in locations}

2 个答案:

答案 0 :(得分:0)

您似乎想要scipy的distance_matrix

df = pd.DataFrame(locations)

x = df[[1,2]]
dm = pd.DataFrame(distance_matrix(x,x),
                  index=df[0],
                  columns=df[0])

输出:

+----------------+------------+------------+------------+------------+--------------+------------+------------+------------+----------------+-------------+------------+--------------+-----------+
|                |  Ottawa    | Edmonton   | Victoria   | Winnipeg   | Fredericton  |  StJohns   |  Halifax   |  Toronto   | Charlottetown  | QuebecCity  |  Regina    | Yellowknife  |  Iqaluit  |
+----------------+------------+------------+------------+------------+--------------+------------+------------+------------+----------------+-------------+------------+--------------+-----------+
| 0              |            |            |            |            |              |            |            |            |                |             |            |              |           |
+----------------+------------+------------+------------+------------+--------------+------------+------------+------------+----------------+-------------+------------+--------------+-----------+
| Ottawa         | 0.000000   | 38.664811  | 47.765105  | 21.906059  | 21.906059    | 23.081609  | 12.148481  | 4.045097   | 12.592181      | 4.689667    | 29.345960  | 42.278586    | 19.678657 |
| Edmonton       | 38.664811  | 0.000000   | 11.107987  | 16.759535  | 16.759535    | 61.080146  | 50.713108  | 35.503607  | 50.896264      | 42.813477   | 9.411122   | 8.953983     | 46.125669 |
| Victoria       | 47.765105  | 11.107987  | 0.000000   | 26.267600  | 26.267600    | 70.658378  | 59.913580  | 44.241193  | 60.276176      | 52.173796   | 18.867990  | 16.637528    | 56.945306 |
| Winnipeg       | 21.906059  | 16.759535  | 26.267600  | 0.000000   | 0.000000     | 44.488147  | 33.976105  | 18.802741  | 34.206429      | 26.105163   | 7.488117   | 21.334745    | 31.794214 |
| Fredericton    | 21.906059  | 16.759535  | 26.267600  | 0.000000   | 0.000000     | 44.488147  | 33.976105  | 18.802741  | 34.206429      | 26.105163   | 7.488117   | 21.334745    | 31.794214 |
| StJohns        | 23.081609  | 61.080146  | 70.658378  | 44.488147  | 44.488147    | 0.000000   | 11.242980  | 26.933071  | 10.500284      | 18.519147   | 51.974763  | 63.454538    | 22.625084 |
| Halifax        | 12.148481  | 50.713108  | 59.913580  | 33.976105  | 33.976105    | 11.242980  | 0.000000   | 15.827902  | 1.651422       | 7.946971    | 41.444115  | 53.851052    | 19.731392 |
| Toronto        | 4.045097   | 35.503607  | 44.241193  | 18.802741  | 18.802741    | 26.933071  | 15.827902  | 0.000000   | 16.434995      | 8.717042    | 26.111037  | 39.703942    | 22.761342 |
| Charlottetown  | 12.592181  | 50.896264  | 60.276176  | 34.206429  | 34.206429    | 10.500284  | 1.651422   | 16.434995  | 0.000000       | 8.108112    | 41.691201  | 53.767927    | 18.320711 |
| QuebecCity     | 4.689667   | 42.813477  | 52.173796  | 26.105163  | 26.105163    | 18.519147  | 7.946971   | 8.717042   | 8.108112       | 0.000000    | 33.587610  | 45.921044    | 17.145385 |
| Regina         | 29.345960  | 9.411122   | 18.867990  | 7.488117   | 7.488117     | 51.974763  | 41.444115  | 26.111037  | 41.691201      | 33.587610   | 0.000000   | 15.477744    | 38.457705 |
| Yellowknife    | 42.278586  | 8.953983   | 16.637528  | 21.334745  | 21.334745    | 63.454538  | 53.851052  | 39.703942  | 53.767927      | 45.921044   | 15.477744  | 0.000000     | 45.896374 |
| Iqaluit        | 19.678657  | 46.125669  | 56.945306  | 31.794214  | 31.794214    | 22.625084  | 19.731392  | 22.761342  | 18.320711      | 17.145385   | 38.457705  | 45.896374    | 0.000000  |
+----------------+------------+------------+------------+------------+--------------+------------+------------+------------+----------------+-------------+------------+--------------+-----------+

答案 1 :(得分:0)

我很确定这就是我想要的:

filtered = dm_df.filter(items=route,axis=1).filter(items=set(locations).difference(set(route)), axis=0)

filtered是具有[2行x 10列]的df,然后我可以从那里找到最小值