如何创建个人累积贡献随时间变化的堆积面积图?

时间:2020-11-01 02:57:01

标签: python-3.x pandas seaborn

鉴于一组个人销售数据的数据点,我试图创建一个堆叠的区域图,显示一段时间内的总销售额,并除以个人销售人员的贡献。

import datetime
import random
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

data = []
for _ in range(100):
    name = random.choice(['Alice', 'Bob', 'Carol', 'Dave', 'Eve'])
    date = datetime.date(2020,1,1) + datetime.timedelta(days=random.randint(0,20))
    sales = random.randint(0,20)
    data.append((name,date,sales))

df = pd.DataFrame(data, columns=['Name', 'Date', 'Sales'])\
    .set_index('Date')\
    .sort_values('Date')
df['Total Sales']= df['Sales'].cumsum()
df['Total Sales by person'] = df.groupby('Name')['Sales'].cumsum()

我几乎可以使用两种方法来实现,但是看不到如何完成它。任何人都可以提供帮助或建议其他方法吗?

方法1:使用seaborn绘制线图。简单漂亮,但我无法将其堆叠

sns.lineplot(data=df, x='Date', y='Total Sales by person', hue='Name')
plt.show()

Seaborn lineplot

方法2:将数据转换为数据透视表,然后使用熊猫面积图。在每个销售人员售出某物的日子,它可以正确堆叠,但是每个“ NaN”都会引起问题。

pt = pd.pivot_table(df, columns=['Name'], index=['Date'], values=['Total Sales by person'])
pt.plot.area()
plt.show()
>>> pt.head(10)
           Total Sales by person                               
Name                       Alice    Bob       Carol  Dave   Eve
Date                                                           
2020-01-01                   NaN   11.0   15.000000  11.0   3.0
2020-01-03                   NaN    NaN   34.000000  20.0   NaN
2020-01-04                   NaN   34.0   52.000000  28.0   NaN
2020-01-05                  19.0   57.0         NaN   NaN  15.0
2020-01-06                  22.0   72.5         NaN  36.0  34.0
2020-01-07                   NaN    NaN   52.000000  50.0  35.0
2020-01-08                  34.0    NaN   62.500000   NaN  51.0
2020-01-09                  53.0   92.0   80.000000  64.5  60.0
2020-01-10                  53.0  107.5   95.666667  75.0   NaN
2020-01-11                   NaN  120.0  136.666667  77.0  67.0

Pandas Area Plot

有什么好主意吗?

1 个答案:

答案 0 :(得分:1)

解决了。我首先需要仅使用销售数据来创建数据透视表,然后使用0填充所有缺失值,然后添加累计和。最终代码:

df = pd.DataFrame(data, columns=['Name', 'Date', 'Sales'])\
    .set_index('Date')\
    .sort_values('Date')

pt = pd.pivot_table(df, columns=['Name'], index=['Date'], values=['Sales'], fill_value=0)
pt = pt.cumsum()
pt.plot.area()
plt.show()

MatPlotLib stacked plot