数据帧:以星期几为基础累积日期

时间:2018-05-08 06:23:34

标签: python pandas

背景

我有一个数据框df,其中有几列表示给定日期的交易数量。数据延续了10年。数据框如下所示:

Date         ANZ_Volume  BHP_Volume  CBA_Volume  MQG_Volume  NAB_Volume                                                                          
2006-01-02     1106877     7280093      955871      148134      901928   
2006-01-03     3498020    16274003     1963392      683766     2429254   
2006-01-04     2844613    11436553     2149895      399354     3223708   
2006-01-05     2661226     7104800     2137384      560498     1649814   
2006-01-08     3040459    14577664     1437820      492849     2690357   
2006-01-09     4346403    12040685     2891248      608287     3273293   
2006-01-10     4367498    15002163     3960253      550975     3514500   
2006-01-11     3598690    15928934     3875594      808685     3487634   
2006-01-12     3542926     9366744     1874046      807708     1838725   
2006-01-15     2291792     7491041     1736446      569285     2465805   
2006-01-16     3352969    10613706     1596676     1071833     2763514   
2006-01-17     4515208    23156310     4200233     1401628     4487772   
2006-01-18     3027208    19241218     2631980      816190     4391474   
2006-01-19     3912358    16356046     3094409      682497     6956628   
2006-01-22     3933020    15533592     3560834      948459     4655687   
2006-01-23     2412419    17092104     3204438      967484     3556701   
2006-01-24     7624649    34777198     9997472     1156034    10233959   
2006-01-26     2683581    24918357     1812563     1841253     3645258   
2006-01-29     2106490    15171772     1811530      506192     5302280   
2006-01-30     4817301    22417229     3126666     1078237     4085055   
2006-01-31     5190244    18597719     2373929     4558877     8095117   
2006-02-01     5899027    15911692     2131606     3622954     8167766  

问题:

我希望在一周中的某一天累积列,并输出一个新的df,如下所示:

DayOfWeek  ANZ_Volume   BHP_Volume  CBA_Volume  MQG_Volume  NAB_Volume
Monday      16035969    69443817    11774899       3873975  14580491
Tueday      25195619    107807393   22495279       8351280  28760602
Wednesday   18053119    87436754    12601638       7488436  22915840
Thursday    12223000    47999362     8917369       2556895  15747447
Friday                  
Saturday                    
Sunday      11371761    52774069     8546630       2516785  15114129

怎么可以这样做?

1 个答案:

答案 0 :(得分:3)

使用groupby并将sum汇总到DatetimeIndex.weekday_name,并且可以通过两种方式订购天数 - ordered categoricalsreindex

cats = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
days = pd.Categorical(df.index.weekday_name, categories=cats, ordered=True)
df = df.groupby(days).sum()
print (df)
           ANZ_Volume  BHP_Volume  CBA_Volume  MQG_Volume  NAB_Volume
Monday       16035969    69443817    11774899     3873975    14580491
Tuesday      25195619   107807393    22495279     8351280    28760602
Wednesday    15369538    62518397    10789075     5647183    19270582
Thursday     12800091    57745947     8918402     3891956    14090425
Friday              0           0           0           0           0
Saturday            0           0           0           0           0
Sunday       11371761    52774069     8546630     2516785    15114129

替代解决方案:

cats = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
df = df.groupby(df.index.weekday_name).sum().reindex(cats)
print (df)
           ANZ_Volume   BHP_Volume  CBA_Volume  MQG_Volume  NAB_Volume
Date                                                                  
Monday     16035969.0   69443817.0  11774899.0   3873975.0  14580491.0
Tuesday    25195619.0  107807393.0  22495279.0   8351280.0  28760602.0
Wednesday  15369538.0   62518397.0  10789075.0   5647183.0  19270582.0
Thursday   12800091.0   57745947.0   8918402.0   3891956.0  14090425.0
Friday            NaN          NaN         NaN         NaN         NaN
Saturday          NaN          NaN         NaN         NaN         NaN
Sunday     11371761.0   52774069.0   8546630.0   2516785.0  15114129.0
df = df.groupby(df.index.weekday_name).sum().reindex(cats, fill_value=0)
print (df)
           ANZ_Volume  BHP_Volume  CBA_Volume  MQG_Volume  NAB_Volume
Date                                                                 
Monday       16035969    69443817    11774899     3873975    14580491
Tuesday      25195619   107807393    22495279     8351280    28760602
Wednesday    15369538    62518397    10789075     5647183    19270582
Thursday     12800091    57745947     8918402     3891956    14090425
Friday              0           0           0           0           0
Saturday            0           0           0           0           0
Sunday       11371761    52774069     8546630     2516785    15114129