我在DataFrame df中包含下表:
date val1 val2 user_id val3 val4 val5 val6
01/01/2011 1 100 3 sterling 100 3 euro
01/02/2013 20 8 sterling 12 15 euro
01/07/2012 19 57 sterling 9 6 euro
01/11/2014 3100 49 6 sterling 15 3 euro
21/12/2012 240 sterling 240 30 euro
14/09/2013 21 63 sterling 34 23 euro
01/12/2013 3200 51 20 sterling 93 56 euro
用于获取上表的代码是:
import pandas as pd
myheaders= ['date','val1', 'val1','val2', 'val3','val4','user_id','val5','val6']
df = pd.read_csv('mytest.csv', names = myheaders, header = False, parse_dates=True, dayfirst=True)
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df = df.loc[:,['date','user_id','val1','val2','val3','val4', 'val5', 'val6']]
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
df1 = df.pivot('date', 'user_id')
但是,我想知道添加语句的原因df2 = df1.resample(' M') 在最后一个代码的末尾,我获得了一个看起来像(只是字段)的数据帧df2 val1 val5 用户身份 日期
而不是像:
val1 val2 val3 val4 val5 val6
USER_ID 日期
提前感谢您的帮助。
答案 0 :(得分:0)
如果您有DatetimeIndex:
,则可以对groupby进行重新取样In [11]: df
Out[11]:
date val1 val2 user_id val3 val4 val5 val6
0 2011-01-01 1 100 3 5 100 3 5
1 2013-01-02 20 8 6 12 15 3 NaN
2 2012-01-07 19 57 10 9 6 6 NaN
3 2014-01-11 3100 49 6 12 15 3 NaN
4 2012-12-21 240 30 240 30 NaN NaN NaN
5 2013-09-14 21 63 90 34 23 6 NaN
6 2013-01-12 3200 51 20 50 93 56 NaN
In [12]: df2 = df.set_index('date') # now you have a DatetimeIndex
In [13]: df2
Out[13]:
val1 val2 user_id val3 val4 val5 val6
date
2011-01-01 1 100 3 5 100 3 5
2013-01-02 20 8 6 12 15 3 NaN
2012-01-07 19 57 10 9 6 6 NaN
2014-01-11 3100 49 6 12 15 3 NaN
2012-12-21 240 30 240 30 NaN NaN NaN
2013-09-14 21 63 90 34 23 6 NaN
2013-01-12 3200 51 20 50 93 56 NaN
In [14]: df2.groupby('user_id').resample('M').dropna(how='all')
Out[14]:
val1 val2 user_id val3 val4 val5 val6
user_id date
3 2011-01-31 1 100 3 5 100 3 5
6 2013-01-31 20 8 6 12 15 3 NaN
2014-01-31 3100 49 6 12 15 3 NaN
10 2012-01-31 19 57 10 9 6 6 NaN
20 2013-01-31 3200 51 20 50 93 56 NaN
90 2013-09-30 21 63 90 34 23 6 NaN
240 2012-12-31 240 30 240 30 NaN NaN NaN