使用7天频率的Python Pandas Pivot Table Groupby日期列

时间:2015-07-20 18:48:50

标签: python pandas grouping

使用Python 3.4和Pandas,我的数据透视表如下所示:

             Impressions                                              
Day           2015-07-06 2015-07-07 2015-07-08 2015-07-09 2015-07-10 2015-07-11 2015-07-12 2015-07-13 2015-07-14 2015-07-15 2015-07-16 2015-07-17 2015-07-18 2015-07-19   
Keyword                                                                
home brewing        1098       1323       2116       2574       1484       1533       1782       1615       1866       1936       1331       1274       1193       1483

使用此代码:

import pandas as pd
import numpy as np
from io import StringIO

data = StringIO('''Day  Keyword Impressions Clicks  Cost    Avg. position   Converted clicks
7/9/2015    "home brewing"  2571    6   4.13    3.1 0
7/8/2015    "home brewing"  2113    13  10.02   3.1 1
7/15/2015   "home brewing"  1933    9   9.3 2.8 0
7/14/2015   "home brewing"  1865    3   2.64    2.6 0
7/12/2015   "home brewing"  1781    7   4.93    2.6 0
7/13/2015   "home brewing"  1612    10  9.67    2.6 0
7/11/2015   "home brewing"  1530    9   9.23    2.6 0
7/10/2015   "home brewing"  1482    4   3.73    2.8 0
7/19/2015   "home brewing"  1482    5   3.26    2.5 0
7/16/2015   "home brewing"  1329    6   5.72    2.9 0
7/7/2015    "home brewing"  1318    3   2.55    2.7 0
7/17/2015   "home brewing"  1272    6   5.42    2.7 0
7/18/2015   "home brewing"  1192    5   4.5 2.5 0
7/6/2015    "home brewing"  1095    8   6.02    2.9 0
7/7/2015    "home brewing"  5   1   0.61    4   0
7/6/2015    "home brewing"  3   0   0   3.3 0
7/8/2015    "home brewing"  3   1   0.61    3.3 0
7/9/2015    "home brewing"  3   0   0   4.3 0
7/13/2015   "home brewing"  3   0   0   2.7 0
7/11/2015   "home brewing"  3   0   0   3.3 0
7/15/2015   "home brewing"  3   0   0   6.3 0
7/10/2015   "home brewing"  2   0   0   4.5 0
7/16/2015   "home brewing"  2   1   0.56    2.5 0
7/17/2015   "home brewing"  2   0   0   4   0
7/12/2015   "home brewing"  1   0   0   2   0
7/14/2015   "home brewing"  1   0   0   7   0
7/18/2015   "home brewing"  1   0   0   2   0
7/19/2015   "home brewing"  1   0   0   4   0''')

df = pd.DataFrame.from_csv(data, sep='\t')
df = df.reset_index()
pt = pd.pivot_table(df, values=['Impressions'], index=['Keyword'], columns=['Day'], aggfunc='sum')

print(pt)

我想要做的是使用Day COLUMNS使用7天frequency进行分组,以获得如下所示的summed数据透视表:

             Impressions
Day           2015-07-06 2015-07-13
Keyword
home brewing        11910       10698

2 个答案:

答案 0 :(得分:1)

一种方法是使用.dt pd.Series获取weekofyear并根据该列进行转移。

import pandas as pd
import numpy as np

# simulate your data
# ===================================
np.random.seed(0)
day = np.random.choice(pd.date_range('2015-07-01', '2015-07-31', freq='D'), size = 100)
impressions = np.random.randint(1, 1000, size=100)
keyword_str = ['home brewing'] * 100
df = pd.DataFrame(dict(Day=day, Keyword=keyword_str, Impressions=impressions))
df

          Day  Impressions       Keyword
0  2015-07-13          204  home brewing
1  2015-07-16          325  home brewing
2  2015-07-22          775  home brewing
3  2015-07-01          965  home brewing
4  2015-07-04           48  home brewing
5  2015-07-28          640  home brewing
6  2015-07-04          132  home brewing
7  2015-07-08          973  home brewing
..        ...          ...           ...
92 2015-07-01          287  home brewing
93 2015-07-15          281  home brewing
94 2015-07-04          638  home brewing
95 2015-07-22          771  home brewing
96 2015-07-13          516  home brewing
97 2015-07-26           95  home brewing
98 2015-07-11          227  home brewing
99 2015-07-21          876  home brewing

[100 rows x 3 columns]

# processing
# ===================================
df['week_of_year'] = df['Day'].dt.weekofyear

          Day  Impressions       Keyword  week_of_year
0  2015-07-13          204  home brewing            29
1  2015-07-16          325  home brewing            29
2  2015-07-22          775  home brewing            30
3  2015-07-01          965  home brewing            27
4  2015-07-04           48  home brewing            27
5  2015-07-28          640  home brewing            31
6  2015-07-04          132  home brewing            27
7  2015-07-08          973  home brewing            28
..        ...          ...           ...           ...
92 2015-07-01          287  home brewing            27
93 2015-07-15          281  home brewing            29
94 2015-07-04          638  home brewing            27
95 2015-07-22          771  home brewing            30
96 2015-07-13          516  home brewing            29
97 2015-07-26           95  home brewing            30
98 2015-07-11          227  home brewing            28
99 2015-07-21          876  home brewing            30



pd.pivot_table(df, index='Keyword', columns='week_of_year', values='Impressions', aggfunc=sum)

week_of_year    27     28    29     30    31
Keyword                                     
home brewing  9656  10934  9419  14519  4320

更新

df.set_index('Day').groupby('Keyword').resample('7D', how=sum).reset_index().pivot(index='Keyword', columns='Day', values='Impressions')

Day           2015-07-01  2015-07-08  2015-07-15  2015-07-22  2015-07-29
Keyword                                                                 
home brewing       13450        9377       13191       10422        2408

答案 1 :(得分:0)

我选择了Jianxun Li的答案作为正确的答案,但我只是想发表评论,因为我确定稍后当我忘记这些时,我会自己重新审视。谢谢Jianxun!

import pandas as pd
import numpy as np
import scipy.stats as sp
from io import StringIO

data = StringIO('''Day  Keyword Impressions Clicks  Cost    Avg. position   Converted clicks
7/9/2015    "home brewing"  2571    6   4.13    3.1 0
7/8/2015    "home brewing"  2113    13  10.02   3.1 1
7/15/2015   "home brewing"  1933    9   9.3 2.8 0
7/14/2015   "home brewing"  1865    3   2.64    2.6 0
7/12/2015   "home brewing"  1781    7   4.93    2.6 0
7/13/2015   "home brewing"  1612    10  9.67    2.6 0
7/11/2015   "home brewing"  1530    9   9.23    2.6 0
7/10/2015   "home brewing"  1482    4   3.73    2.8 0
7/19/2015   "home brewing"  1482    5   3.26    2.5 0
7/16/2015   "home brewing"  1329    6   5.72    2.9 0
7/7/2015    "home brewing"  1318    3   2.55    2.7 0
7/17/2015   "home brewing"  1272    6   5.42    2.7 0
7/18/2015   "home brewing"  1192    5   4.5 2.5 0
7/6/2015    "home brewing"  1095    8   6.02    2.9 0
7/7/2015    "home brewing"  5   1   0.61    4   0
7/6/2015    "home brewing"  3   0   0   3.3 0
7/8/2015    "home brewing"  3   1   0.61    3.3 0
7/9/2015    "home brewing"  3   0   0   4.3 0
7/13/2015   "home brewing"  3   0   0   2.7 0
7/11/2015   "home brewing"  3   0   0   3.3 0
7/15/2015   "home brewing"  3   0   0   6.3 0
7/10/2015   "home brewing"  2   0   0   4.5 0
7/16/2015   "home brewing"  2   1   0.56    2.5 0
7/17/2015   "home brewing"  2   0   0   4   0
7/12/2015   "home brewing"  1   0   0   2   0
7/14/2015   "home brewing"  1   0   0   7   0
7/18/2015   "home brewing"  1   0   0   2   0
7/19/2015   "home brewing"  1   0   0   4   0''')

#Read data into dataframe
df = pd.DataFrame.from_csv(data, sep='\t', index_col=None)
#Drop unneeded columns
df = df.drop(['Clicks', 'Cost', 'Converted clicks', 'Avg. position'], axis=1)
#set 'Day' to a datetime dtype
df['Day'] = pd.to_datetime(df['Day'])
#Set index to be 'Day'
df = df.set_index('Day')
#Group by keyword
df = df.groupby('Keyword')
#Resample the index by 7 days and sum
df = df.resample('7D', how=sum)
'''df looks like this currently...
                         Impressions
Keyword      Day                    
home brewing 2015-07-06        11910
             2015-07-13        10698
'''
#Reset the index now that date is grouped
df = df.reset_index()
'''
        Keyword        Day  Impressions
0  home brewing 2015-07-06        11910
1  home brewing 2015-07-13        10698
'''
#This part pivots the data to have 'Day' be columns
df = df.pivot(index='Keyword', columns='Day', values='Impressions')

print(df)
''' #End Result#
Day           2015-07-06  2015-07-13
Keyword                             
home brewing       11910       10698
'''