背景
我有一个数据框df
,其中有几列表示给定日期的交易数量。数据延续了10年。数据框如下所示:
Date ANZ_Volume BHP_Volume CBA_Volume MQG_Volume NAB_Volume
2006-01-02 1106877 7280093 955871 148134 901928
2006-01-03 3498020 16274003 1963392 683766 2429254
2006-01-04 2844613 11436553 2149895 399354 3223708
2006-01-05 2661226 7104800 2137384 560498 1649814
2006-01-08 3040459 14577664 1437820 492849 2690357
2006-01-09 4346403 12040685 2891248 608287 3273293
2006-01-10 4367498 15002163 3960253 550975 3514500
2006-01-11 3598690 15928934 3875594 808685 3487634
2006-01-12 3542926 9366744 1874046 807708 1838725
2006-01-15 2291792 7491041 1736446 569285 2465805
2006-01-16 3352969 10613706 1596676 1071833 2763514
2006-01-17 4515208 23156310 4200233 1401628 4487772
2006-01-18 3027208 19241218 2631980 816190 4391474
2006-01-19 3912358 16356046 3094409 682497 6956628
2006-01-22 3933020 15533592 3560834 948459 4655687
2006-01-23 2412419 17092104 3204438 967484 3556701
2006-01-24 7624649 34777198 9997472 1156034 10233959
2006-01-26 2683581 24918357 1812563 1841253 3645258
2006-01-29 2106490 15171772 1811530 506192 5302280
2006-01-30 4817301 22417229 3126666 1078237 4085055
2006-01-31 5190244 18597719 2373929 4558877 8095117
2006-02-01 5899027 15911692 2131606 3622954 8167766
问题:
我希望在一周中的某一天累积列,并输出一个新的df,如下所示:
DayOfWeek ANZ_Volume BHP_Volume CBA_Volume MQG_Volume NAB_Volume
Monday 16035969 69443817 11774899 3873975 14580491
Tueday 25195619 107807393 22495279 8351280 28760602
Wednesday 18053119 87436754 12601638 7488436 22915840
Thursday 12223000 47999362 8917369 2556895 15747447
Friday
Saturday
Sunday 11371761 52774069 8546630 2516785 15114129
怎么可以这样做?
答案 0 :(得分:3)
使用groupby
并将sum
汇总到DatetimeIndex.weekday_name
,并且可以通过两种方式订购天数 - ordered categoricals或reindex
:
cats = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
days = pd.Categorical(df.index.weekday_name, categories=cats, ordered=True)
df = df.groupby(days).sum()
print (df)
ANZ_Volume BHP_Volume CBA_Volume MQG_Volume NAB_Volume
Monday 16035969 69443817 11774899 3873975 14580491
Tuesday 25195619 107807393 22495279 8351280 28760602
Wednesday 15369538 62518397 10789075 5647183 19270582
Thursday 12800091 57745947 8918402 3891956 14090425
Friday 0 0 0 0 0
Saturday 0 0 0 0 0
Sunday 11371761 52774069 8546630 2516785 15114129
替代解决方案:
cats = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
df = df.groupby(df.index.weekday_name).sum().reindex(cats)
print (df)
ANZ_Volume BHP_Volume CBA_Volume MQG_Volume NAB_Volume
Date
Monday 16035969.0 69443817.0 11774899.0 3873975.0 14580491.0
Tuesday 25195619.0 107807393.0 22495279.0 8351280.0 28760602.0
Wednesday 15369538.0 62518397.0 10789075.0 5647183.0 19270582.0
Thursday 12800091.0 57745947.0 8918402.0 3891956.0 14090425.0
Friday NaN NaN NaN NaN NaN
Saturday NaN NaN NaN NaN NaN
Sunday 11371761.0 52774069.0 8546630.0 2516785.0 15114129.0
df = df.groupby(df.index.weekday_name).sum().reindex(cats, fill_value=0)
print (df)
ANZ_Volume BHP_Volume CBA_Volume MQG_Volume NAB_Volume
Date
Monday 16035969 69443817 11774899 3873975 14580491
Tuesday 25195619 107807393 22495279 8351280 28760602
Wednesday 15369538 62518397 10789075 5647183 19270582
Thursday 12800091 57745947 8918402 3891956 14090425
Friday 0 0 0 0 0
Saturday 0 0 0 0 0
Sunday 11371761 52774069 8546630 2516785 15114129