我有一个跨度为4年的数据集,希望将其绘制在图表上,并将每年作为一个单独的系列。我的数据是2015年3月至2018年8月的每日详细信息,我希望按月汇总和显示。
plt.clf() # clear figures
plt.figure(figsize=(16,8))
x = np.arange(0, 12, 1)
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
total_sales_2015 = train.loc[train['Year'] == '2015'].groupby('Month')['SalesValue'].sum()/1000.0 # format into thousands
total_sales_2016 = train.loc[train['Year'] == '2016'].groupby('Month')['SalesValue'].sum()/1000.0 # format into thousands
total_sales_2017 = train.loc[train['Year'] == '2017'].groupby('Month')['SalesValue'].sum()/1000.0 # format into thousands
total_sales_2018 = train.loc[train['Year'] == '2018'].groupby('Month')['SalesValue'].sum()/1000.0 # format into thousands
plt.plot(x, total_sales_2015, label="2015") <-- doesn't work, as only 10 data points
plt.plot(x, total_sales_2016, label="2016") <-- does work
plt.plot(x, total_sales_2017, label="2017") <-- does work
plt.plot(x, total_sales_2018, label="2018") <-- doesn't work, as only 8 data points
如何在图表上显示部分年份?当我运行上面的代码时,它会产生以下错误:“ ValueError:x和y必须具有相同的第一维”
答案 0 :(得分:1)
您可以对Series.reindex
创建的所有可能的索引使用MultiIndex.from_product
:
np.random.seed(123)
train = pd.DataFrame({'Year':['2015'] * 10 + ['2018'] * 8,
'Month': list(range(3, 13)) + list(range(1, 9)),
'SalesValue':np.random.randint(1000, size=18)})
train['Month'] = train['Month'].astype(str).str.zfill(2)
print (train)
Year Month SalesValue
0 2015 03 510
1 2015 04 365
2 2015 05 382
3 2015 06 322
4 2015 07 988
5 2015 08 98
6 2015 09 742
7 2015 10 17
8 2015 11 595
9 2015 12 106
10 2018 01 123
11 2018 02 569
12 2018 03 214
13 2018 04 737
14 2018 05 96
15 2018 06 113
16 2018 07 638
17 2018 08 47
total_sales = train.groupby(['Year','Month'])['SalesValue'].sum() / 1000
years = np.arange(2015, 2019).astype(str)
months = pd.Series(np.arange(1, 13, 1)).astype(str).str.zfill(2)
mux = pd.MultiIndex.from_product([years, months], names=total_sales.index.names)
total_sales = total_sales.reindex(mux)
print (total_sales)
Year Month
2015 01 NaN
02 NaN
03 0.510
04 0.365
05 0.382
06 0.322
07 0.988
08 0.098
09 0.742
10 0.017
11 0.595
12 0.106
2016 01 NaN
02 NaN
03 NaN
04 NaN
05 NaN
06 NaN
07 NaN
08 NaN
09 NaN
10 NaN
11 NaN
12 NaN
2017 01 NaN
02 NaN
03 NaN
04 NaN
05 NaN
06 NaN
07 NaN
08 NaN
09 NaN
10 NaN
11 NaN
12 NaN
2018 01 0.123
02 0.569
03 0.214
04 0.737
05 0.096
06 0.113
07 0.638
08 0.047
09 NaN
10 NaN
11 NaN
12 NaN
Name: SalesValue, dtype: float64
plt.plot(x, total_sales.loc['2015'], label="2015")
plt.plot(x, total_sales.loc['2016'], label="2016")
plt.plot(x, total_sales.loc['2017'], label="2017")
plt.plot(x, total_sales.loc['2018'], label="2018")
如果x-axis
中可能的值是几个月,则将Series.unstack
与DataFrame.plot
一起使用:
plt.figure(figsize=(16,8))
total_sales.unstack(level=0).plot()
答案 1 :(得分:0)
我通过将Nan
值附加到2015年系列的开始和2018年系列的结束来解决我遇到的问题,如下所示:
from numpy import nan
s1 = pd.Series([nan,nan], index=['01','02'])
s2 = pd.Series([nan,nan,nan,nan], index=['09', '10', '11', '12'])
total_sales_2015 = s1.append(total_sales_2015)
total_sales_2018 = total_sales_2018.append(s2)
每个系列现在都有12个数据点,可以在图表上进行绘制。