我有一个代码,可在我的数据集中的ID之间生成距离矩阵:
id 5141 5578 5141 5822 5170 5680
id
5141 0.000000 47.169906 1.000000 ... 77.524190 134.851770 112.178429
5578 47.169906 0.000000 47.265209 ... 111.521298 127.882759 126.479247
5141 1.000000 47.265209 0.000000 ... 76.661594 135.823415 113.159180
5578 48.166378 1.000000 48.259714 ... 112.294256 128.003906 127.027556
5141 8.602325 54.744863 8.062258 ... 69.771054 141.481448 115.974135
5578 49.162994 2.000000 49.254441 ... 113.070774 128.132744 127.581347
5578 49.091751 2.236068 49.162994 ... 112.445542 129.123971 128.413395
5141 13.928388 60.671245 13.601471 ... 67.230945 143.251527 115.351636
5578 51.088159 4.123106 51.156622 ... 114.017543 129.402473 129.529919
5141 16.278821 63.387696 16.124515 ... 68.007353 142.337627 113.159180
5578 51.088159 4.123106 51.156622 ... 114.017543 129.402473 129.529919
5141 16.124515 63.285069 16.031220 ... 68.949257 141.396605 112.160599
5578 50.089919 3.162278 50.159745 ... 113.229855 129.259429 128.968989
5141 14.764823 60.074953 15.264338 ... 78.434686 131.912850 103.392456
5141 16.401219 57.706152 17.204651 ... 85.094066 125.251746 97.739450
5578 50.089919 3.162278 50.159745 ... 113.229855 129.259429 128.968989
5578 50.089919 3.162278 50.159745 ... 113.229855 129.259429 128.968989
5141 17.000000 56.089215 17.888544 ... 87.664132 122.702893 96.026038
5578 50.089919 3.162278 50.159745 ... 113.229855 129.259429 128.968989
5141 17.492856 57.070132 18.357560 ... 87.315520 123.032516 95.885348
5578 50.089919 3.162278 50.159745 ... 113.229855 129.259429 128.968989
我的目标是根据这些距离找到一组ID。我接下来要做的是:
#Replace minimum distance with column name and not the minimum with `False`.
closest = np.where(df_dist.eq(df_dist[df_dist != 0].min(),0),df_dist.columns,False)
这为我提供了单元格中最接近的ID的名称:
Out[32]:
array([[ 0, 0, 5141, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0],
[5141, 0, 0, ..., 0, 0, 0],
...,
[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0],
[ 0, 0, 0, ..., 0, 0, 0]], dtype=int64)
# Remove false from the array and get the column names as list.
df1['closest'] = [i[i.astype(bool)].tolist() for i in closest]
df2['closest'] = df2['closest'].agg(pd.unique)
这为我提供了ID最接近的新列。
date
2019-09-17 12:00:00.032000+00:00 [5141]
2019-09-17 12:00:00.032000+00:00 [5578, 5621]
2019-09-17 12:00:00.191000+00:00 [5141]
2019-09-17 12:00:00.191000+00:00 [5578]
2019-09-17 12:00:00.505000+00:00 [5141]
2019-09-17 12:00:00.505000+00:00 [5578, 5621]
2019-09-17 12:00:00.740000+00:00 [5578]
2019-09-17 12:00:00.740000+00:00 [5622]
2019-09-17 12:00:01.034000+00:00 [5578, 5621]
2019-09-17 12:00:01.034000+00:00 [5141, 5622]
2019-09-17 12:00:01.179000+00:00 [5578, 5621]
2019-09-17 12:00:01.179000+00:00 [5141]
2019-09-17 12:00:01.476000+00:00 [5578, 5621]
2019-09-17 12:00:01.476000+00:00 [5141]
2019-09-17 12:00:01.704000+00:00 [5141
现在,如何调整此代码,以便创建
我希望这是有道理的,并希望那里有人可以帮助我。