如何根据Python熊猫中的距离进行聚类?

时间:2018-04-07 20:06:11

标签: python pandas dataframe merge euclidean-distance

我有两个数据帧,有两组电台信息。一个用于15个小站,另一个用于5个主站。

小站信息(15 * 3):

       SmallStation_ID  longitude  latitude
0           dongsi_aq    116.417    39.929
1          tiantan_aq    116.407    39.886
2         guanyuan_aq    116.339    39.929
3    wanshouxigong_aq    116.352    39.878
4     aotizhongxin_aq    116.397    39.982
5     nongzhanguan_aq    116.461    39.937
6           wanliu_aq    116.287    39.987
7       beibuxinqu_aq    116.174    40.090
8        zhiwuyuan_aq    116.207    40.002
9   fengtaihuayuan_aq    116.279    39.863
10         yungang_aq    116.146    39.824
11         gucheng_aq    116.184    39.914
12        fangshan_aq    116.136    39.742
13          daxing_aq    116.404    39.718
14        yizhuang_aq    116.506    39.795

主站信息(5 * 9):

    MainStation_id   longitude   latitude  temperature  \
0       shunyi_meo  116.615278  40.126667         -1.7   
1       hadian_meo  116.290556  39.986944         -1.6   
2      yanqing_meo  115.968889  40.449444         -8.8   
3        miyun_meo  116.864167  40.377500         -6.6   
4      huairou_meo  116.626944  40.357778         -5.2 



       pressure  humidity  wind_direction  wind_speed      weather  
0        1028.7        15           215.0         1.6  Sunny/clear  
1        1026.1        14           231.0         2.5  Sunny/clear  
2         970.8        35           305.0         0.8         Haze  
3        1023.3        28        999017.0         0.2         Haze  
4        1022.8        27            30.0         0.8  Sunny/clear

我想在计算距离后将这些小站分类到主站:sqrt((x1-x2)^2+(y1-y2)^2)(这里x和y分别是经度和纬度)。找到最近的邻居。然后合并这两组数据帧以获得主站的外部天气信息。最终数据框的头部似乎是,

 SmallStation_ID  distance  temperature  pressure  humidity  wind_direction  wind_speed      weather

这是一个15 * 8的数据帧。

希望我明白这个问题。 谢谢!

1 个答案:

答案 0 :(得分:0)

好吧......我自己解决了这个问题......

我不熟悉迭代,希望有人在一些pandas函数中让我更容易理解。

Small Station是一个名为aqstation的数据框。

主站是一个名为meostation的数据框。

l = []
# All I want to do is to merge Main Station weather information into Small Stations...

计算欧几里德距离,然后将小站分类为主站。

for i in range(len(aqstation)):

 station = meostation['station_id'][(((aqstation['longitude'][i]-meostation['longitude'])**2+(aqstation['latitude'][i]-meostation['latitude'])**2)**(0.5)).idxmin()]
 l.append(station)

# print(len(l))
aqstation['station_id'] = l


del aqstation['longitude']
del aqstation['latitude']
del meostation['longitude']
del meostation['latitude']

合并两个数据帧。

aqstation = pd.merge(aqstation, meostation, how='left', on='station_id')
print(aqstation.head(10))

          Station ID       station_id    temperature  \
0          dongsi_aq     chaoyang_meo           -0.7   
1         tiantan_aq      beijing_meo           -2.5   
2        guanyuan_aq       hadian_meo           -1.6   
3   wanshouxigong_aq      fengtai_meo           -1.4   
4    aotizhongxin_aq       hadian_meo           -1.6   
5    nongzhanguan_aq     chaoyang_meo           -0.7   
6          wanliu_aq       hadian_meo           -1.6   
7      beibuxinqu_aq    pingchang_meo           -3.0   
8       zhiwuyuan_aq  shijingshan_meo           -1.8   
9  fengtaihuayuan_aq      fengtai_meo           -1.4   

   pressure  humidity  wind_direction  wind_speed      weather  
0    1027.9        13           239.0         2.7  Sunny/clear  
1    1028.5        16           225.0         2.4         Haze  
2    1026.1        14           231.0         2.5  Sunny/clear  
3    1025.2        16           210.0         1.4  Sunny/clear  
4    1026.1        14           231.0         2.5  Sunny/clear  
5    1027.9        13           239.0         2.7  Sunny/clear  
6    1026.1        14           231.0         2.5  Sunny/clear  
7    1022.5        17           108.0         1.1  Sunny/clear  
8    1024.0        12           201.0         2.5  Sunny/clear  
9    1025.2        16           210.0         1.4  Sunny/clear

我的代码非常冗长。希望有人能让它变得更简单。