我有一些数据显示每个人烘焙的馅饼数(平均)。我想绘制一张图表:
平均烤饼数量排名前10%,排名前20%,...排名前100%:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
baked_count = np.random.normal(10, scale = 3.0, size = 100)
df = pd.DataFrame(baked_count, columns = ['performance'])
df['performance'].hist()
plt.show()
points_x = []
points_y = []
x = 0
for index, row in df.sort_values('performance', ascending = False).iterrows():
y = df[df['performance'] >= row['performance']]['performance'].mean()
x += 1
points_x.append(x)
points_y.append(y)
points_x = np.array(points_x)
points_y = np.array(points_y)
plt.scatter(points_x, points_y)
plt.axvline(points_x.min(), color='g', linestyle='dashed', linewidth=1)
plt.axvline(points_x.max(), color='g', linestyle='dashed', linewidth=1)
plt.axhline(points_y.min(), color='g', linestyle='dashed', linewidth=1)
plt.axhline(points_y.max(), color='g', linestyle='dashed', linewidth=1)
plt.show()
是否有一些标准的numpy / pyplot / pandas方式来做这件事?
答案 0 :(得分:1)
如果我理解正确,您想要计算已排序的performance
系列的累积平均值。您可以通过将系列cumsum()
除以累计计数来完成此操作。例如:
x = np.arange(1, df.shape[0]+1)
y = df.performance.sort_values(ascending=False).cumsum() / x
plt.scatter(x, y)
或更优雅expanding
意味着:
y = df.performance.sort_values(ascending=False).expanding().mean()