核密度估计计数(KDE)

时间:2018-04-26 09:43:46

标签: pandas dataframe seaborn

我有一些数据(A,B),并使用seaborn制作它的轮廓图。

import pandas  as pd
import seaborn as sns

# Dataframe 1
df_1 = pd.DataFrame({'A':[1,2,1,2,3,4,2,1,4], 'B': [2,1,2,1,2,3,4,2,1]})

# Plot A v B
ax = sns.kdeplot(df_1["A"], df_1["B"])

A v B

我想得到累积计数(C)。我想用Y轴上的C,X轴上的A和B的轮廓创建一个新的图。我想如果我可以通过创建A,B,H的新数据帧开始,其中H是计数(火山的高度)然后可能是一个开始。结果图可能看起来像这样:

enter image description here

1 个答案:

答案 0 :(得分:0)

我想我已经解决了,但这个解决方案很麻烦:

import pandas  as pd
import numpy   as np
from   scipy   import stats
from itertools import chain

Fruit = 9 # How many were there?

# Dataframe 1
df_1 = pd.DataFrame({'A':[1,2,1,2,3,4,2,1,4], 'B': [2,1,2,1,2,3,4,2,1]})

m1 = df_1["A"]
m2 = df_1["B"]

xmin = 0
xmax = 5
ymin = 0
ymax = 5

# Kernel density estimate:
X, Y      = np.mgrid[xmin:xmax:5j, ymin:ymax:5j]
positions = np.vstack([X.ravel(), Y.ravel()])
values    = np.vstack([m1, m2])
kernel    = stats.gaussian_kde(values)
H         = np.reshape(kernel(positions).T, X.shape)

# Re-jig it
X = X.reshape((25, 1))
Y = Y.reshape((25, 1))
H = H.reshape((25, 1))

X_L = list(chain.from_iterable(X))
Y_L = list(chain.from_iterable(Y))
H_L = list(chain.from_iterable(H))

df_2 = pd.DataFrame({'A': X_L, 'B': Y_L, 'H': H_L})

# Find the cumulative count C
df_2      = df_2.sort_values('B')
C         = np.cumsum(H)
C         = C.reshape((25, 1))
C_L       = list(chain.from_iterable(C))
df_2['C'] = pd.DataFrame(C_L, index=df_2.index)

# Scale C
Max_C = np.amax(C)
df_2.loc[:,'C'] *= Fruit / Max_C

# Break it down to constant B
df_2_B_0 = df_2[df_2['B'] == 0]
df_2_B_1 = df_2[df_2['B'] == 1]
df_2_B_2 = df_2[df_2['B'] == 2]
df_2_B_3 = df_2[df_2['B'] == 3]
df_2_B_4 = df_2[df_2['B'] == 4]

# Plot A v C
ax = df_2_B_0.plot('A','C', label='0')
df_2_B_1.plot('A','C',ax=ax, label='1')
df_2_B_2.plot('A','C',ax=ax, label='2')
df_2_B_3.plot('A','C',ax=ax, label='3')
df_2_B_4.plot('A','C',ax=ax, label='4')

plt.ylabel('C')
plt.legend(title='B')

enter image description here