我有一个带有日期索引和2列的数据框:
val week
2015-01-02 16729 1
2015-01-09 16225 2
2015-01-16 15250 3
2015-01-23 15690 4
2015-01-30 16025 5
... ... ...
2020-03-20 16417 12
2020-03-27 15481 13
2020-04-03 14216 14
2020-04-10 13113 15
2020-04-17 12825 16
我想做的是对年份进行透视或分组,然后将月和周作为索引。
2015 ... 2020
01-1 16729 ... ...
01-2 16225 ... ...
01-3 15250 ... ...
01-4 15690 ... ...
01-5 16025 ... ...
... ... ... ...
03-12 ... ... 16417
03-13 ... ... 15481
04-14 ... ... 14216
04-15 ... ... 13113
04-16 ... ... 12825
最好只保留月和日作为索引,但是由于它是每周的频率,因此一周中的实际日期在多年中会有所不同。如果有办法汇总日期,那么准确确定日期并不是很重要。
2015 ... 2020
01-02 16729 ... ...
01-09 16225 ... ...
01-16 15250 ... ...
01-23 15690 ... ...
01-30 16025 ... ...
... ... ... ...
03-20 ... ... 16417
03-27 ... ... 15481
04-03 ... ... 14216
04-10 ... ... 13113
04-17 ... ... 12825
我尝试使用pd.Grouper
和groupby
的变体,但似乎无法正确处理。我还对其他有关如何安排此问题的建议持开放态度,因为该想法是每年在同一条线图中绘制为一条单独的线。
答案 0 :(得分:1)
在所有注释之后,似乎是时候编写一些代码了。有点黑,但这也许会帮助您:
import numpy as np
import pandas as pd
# example df with some random values.
df = pd.DataFrame({'t': ['2015-01-02','2015-01-03','2015-01-16','2015-01-23','2015-01-30', '2020-01-01'],
'val': [16729, 16225, 15250, 15690, 16025, 999],
'week': [1, 2, 3, 4, 5, 1]})
df['t'] = pd.to_datetime(df['t'])
# pivot to get years as columns
df1 = pd.pivot_table(df, values='val', columns=df['t'].dt.year, index=df['t'])
# create a new column "date" for later on... cast to datetime object for now
df1['date'] = pd.to_datetime(df1.index.date)
# sum the values for every week and drop the original "t" (datetime) column
df2 = df1.groupby(df1.index.week).resample('W-Mon', on='date').sum().reset_index().sort_values(by='date').drop(columns=['t'])
# drop all rows that only hold zeros
df2 = df2.loc[~np.isclose(df2.loc[:, df2.columns != 'date'], 0)]
# finally, format the datetime column to string as desired
df2['month-week'] = df2['date'].dt.strftime('%m-%W')