Question

我有2个数据集：

df1 =

    id_first    latitude    longitude
0   403         45.0714     7.6187
1   403         45.0739     7.6195
2   1249        45.0745     7.6152
3   1249        45.1067     7.6451
4   1249        45.1062     7.6482
5   1531        45.1088     7.6528
6   1531        45.1005     7.6155
7   14318       45.1047     7.6056

df2 =

    id_now  cluster_group
0   403     0
1   1249    1
2   1531    3
3   14318   3

我想无法创建循环（或其他）：

df2

值403仅属于一个cluster_group (0)进入df1，检查与403纬度相关的所有点-2个点和经度-2 点。并绘制它们。
在一张图中重复整个df1 df2 bu的绘图（每个群集的颜色不同）-我可以实际进行管理，但是如果您可以提供smth（？）

P.S。 df2 1531和14318中的一个属于同一群集。所以无论如何，我想用一种颜色（或一张地图）绘制其点。

尝试：

n_clusters = 46

for k in range(0, n_clusters):
     ....

https://codesandbox.io/s/74n5rvr75x

每种颜色都代表cluster_group

Answer 1

这是使用pandas和matplotlib.pyplot的方法。

import pandas as pd
import matplotlib.pyplot as plt

#here I read the dataframe from a file, you read it in the way you prefer
df1 = pd.read_csv('data.txt', sep='\s+')
df2 = pd.read_csv('data2.txt', sep='\s+')

#the important piece of code is here:
for g, gdf in df2.groupby('cluster_group'):
    df1_to_plot = df1.loc[df1['id_first'].isin(gdf['id_now'])]
    plt.plot(df1_to_plot['latitude'], df1_to_plot['longitude'], label='Cluster {:d}'.format(g))

plt.legend()
plt.show()

一些您不熟悉groupby和isin的解释：

df2.groupby('cluster_group')返回df2的子集上的迭代器，每个子集都将'cluster_group'列中具有相同值的所有行组合在一起。
使用这些子集gdf中的每一个，我选择df1的行，其中'id_first'列中的值包含在gdf中。这是通过isin方法完成的。此选择存储在数据帧df1_to_plot中，其中包含要绘制的数据。
现在我可以使用plt.plot来实际绘制数据了。 Matplotlib会自行处理颜色。创建图例时，label方法使用legend参数。

使用您提供的简单数据，此代码将生成以下图像（x轴为纬度，y轴为经度：

使用来自不同数据集的for循环进行绘图

1 个答案: