熊猫:情节从上到下表现

时间:2017-04-16 17:25:00

标签: python pandas numpy matplotlib plot

我有一些数据显示每个人烘焙的馅饼数(平均)。我想绘制一张图表:

  

平均烤饼数量排名前10%,排名前20%,...排名前100%:

enter image description here

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

baked_count = np.random.normal(10, scale = 3.0, size = 100)

df = pd.DataFrame(baked_count, columns = ['performance'])

df['performance'].hist()
plt.show()

points_x = []
points_y = []

x = 0
for index, row in df.sort_values('performance', ascending = False).iterrows():
    y = df[df['performance'] >= row['performance']]['performance'].mean()

    x += 1

    points_x.append(x)
    points_y.append(y)

points_x = np.array(points_x)    
points_y = np.array(points_y)    

plt.scatter(points_x, points_y)

plt.axvline(points_x.min(), color='g', linestyle='dashed', linewidth=1)
plt.axvline(points_x.max(), color='g', linestyle='dashed', linewidth=1)
plt.axhline(points_y.min(), color='g', linestyle='dashed', linewidth=1)
plt.axhline(points_y.max(), color='g', linestyle='dashed', linewidth=1)

plt.show()

是否有一些标准的numpy / pyplot / pandas方式来做这件事?

1 个答案:

答案 0 :(得分:1)

如果我理解正确,您想要计算已排序的performance系列的累积平均值。您可以通过将系列cumsum()除以累计计数来完成此操作。例如:

x = np.arange(1, df.shape[0]+1)
y = df.performance.sort_values(ascending=False).cumsum() / x
plt.scatter(x, y)

或更优雅expanding意味着:

y = df.performance.sort_values(ascending=False).expanding().mean()