鉴于一组个人销售数据的数据点,我试图创建一个堆叠的区域图,显示一段时间内的总销售额,并除以个人销售人员的贡献。
import datetime
import random
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = []
for _ in range(100):
name = random.choice(['Alice', 'Bob', 'Carol', 'Dave', 'Eve'])
date = datetime.date(2020,1,1) + datetime.timedelta(days=random.randint(0,20))
sales = random.randint(0,20)
data.append((name,date,sales))
df = pd.DataFrame(data, columns=['Name', 'Date', 'Sales'])\
.set_index('Date')\
.sort_values('Date')
df['Total Sales']= df['Sales'].cumsum()
df['Total Sales by person'] = df.groupby('Name')['Sales'].cumsum()
我几乎可以使用两种方法来实现,但是看不到如何完成它。任何人都可以提供帮助或建议其他方法吗?
方法1:使用seaborn绘制线图。简单漂亮,但我无法将其堆叠
sns.lineplot(data=df, x='Date', y='Total Sales by person', hue='Name')
plt.show()
方法2:将数据转换为数据透视表,然后使用熊猫面积图。在每个销售人员售出某物的日子,它可以正确堆叠,但是每个“ NaN”都会引起问题。
pt = pd.pivot_table(df, columns=['Name'], index=['Date'], values=['Total Sales by person'])
pt.plot.area()
plt.show()
>>> pt.head(10)
Total Sales by person
Name Alice Bob Carol Dave Eve
Date
2020-01-01 NaN 11.0 15.000000 11.0 3.0
2020-01-03 NaN NaN 34.000000 20.0 NaN
2020-01-04 NaN 34.0 52.000000 28.0 NaN
2020-01-05 19.0 57.0 NaN NaN 15.0
2020-01-06 22.0 72.5 NaN 36.0 34.0
2020-01-07 NaN NaN 52.000000 50.0 35.0
2020-01-08 34.0 NaN 62.500000 NaN 51.0
2020-01-09 53.0 92.0 80.000000 64.5 60.0
2020-01-10 53.0 107.5 95.666667 75.0 NaN
2020-01-11 NaN 120.0 136.666667 77.0 67.0
有什么好主意吗?
答案 0 :(得分:1)