我有一个坐标为(X,Y)的数据框,我需要获取一个包含密度最高的点的坐标的列表。
我正在使用坐标(X,Y)的平均值,并从该点计算到所有其他点的距离,然后对它们进行排序,但是平均值并不总是在最密集的点上。 使用gaussian_kde,我可以可视化最密集的点,但是我不知道如何将这些点提取到列表中。
import numpy as np
import pandas as pd
import pylab as plt
import random
from scipy.stats import gaussian_kde
from scipy.spatial.distance import cdist
from scipy.spatial import distance
def closest_point(point, points):
""" Find the nearest point. """
return points[cdist([point], points).argmin()]
x = [random.randint(0, 100) for x in range(1, 51)]
y = [random.randint(0, 100) for x in range(1, 51)]
fr = pd.DataFrame({'x':x,'y':y})
mx = fr['x'].mean()
my = fr['y'].mean()
fr2 = pd.DataFrame({'x':[mx],'y':[my]})
fr['Punto'] = [(x, y) for x,y in zip(fr['x'], fr['y'])]
fr2['Punto'] = [(x, y) for x,y in zip(fr2['x'], fr2['y'])]
fr2['Cercano'] = [closest_point(x, list(fr['Punto'])) for x in fr2['Punto']]
lista = fr['Punto'].tolist()
media = fr2['Punto'].tolist()
distancia_numpy = distance.cdist(lista,media, 'euclidean')
distancia_lista = np.array(distancia_numpy).tolist()
distancia_serie = pd.Series(distancia_lista)
"""
we place a new column with the distance from the average point to the nearest point
"""
fr['Distancia'] = distancia_serie
ordenado = fr.sort_values('Distancia', ascending = True)
xy = np.vstack([x,y])
z = gaussian_kde(xy)(xy)
fig, ax = plt.subplots()
ax.scatter(x, y, s=50, c=z, edgecolor='')
"""in red the mean of the points"""
ax.scatter(mx, my, s=100,c='red', edgecolor='')
plt.show()
print (ordenado)
结果应该是列表或有序数据帧,其中最密集的点首先出现,实际上我得到了那些结果,但是它们不正确,因为平均点不在最大密度的点上。 任何帮助都非常欢迎
答案 0 :(得分:0)
像您这样的声音需要按估计的pdf进行排序:使用z.evaluate(xy)
作为(反向)排序键将首先为您提供最可能的点。
答案 1 :(得分:0)
非常感谢!,这段代码可以完成工作!
point_gaus = pd.DataFrame({'x':x,'y':y,'gauss':list(z)})
point_gaus_order = point_gaus.sort_values('gauss', ascending = False)
point_gaus_order_10 = point_gaus_order[:10]
ax.scatter(point_gaus_order_10['x'],point_gaus_order_10['y'], s=25,c='red', edgecolor='')