DateOccurred CostCentre TimeDifference
03/09/2012 2073 28138
03/09/2012 6078 34844
03/09/2012 8273 31215
03/09/2012 8367 28160
03/09/2012 8959 32037
03/09/2012 9292 30118
03/09/2012 9532 34200
03/09/2012 9705 27240
03/09/2012 10085 31431
03/09/2012 10220 22555
04/09/2012 6078 41126
04/09/2012 7569 31101
04/09/2012 8273 30994
04/09/2012 8959 30064
04/09/2012 9532 34655
04/09/2012 9705 26475
04/09/2012 10085 31443
04/09/2012 10220 33970
05/09/2012 2073 28221
05/09/2012 6078 27894
05/09/2012 7569 29012
05/09/2012 8239 42208
05/09/2012 8273 31128
05/09/2012 8367 27993
05/09/2012 8959 20669
05/09/2012 9292 33070
05/09/2012 9532 8189
05/09/2012 9705 27540
05/09/2012 10085 28798
05/09/2012 10220 23164
06/09/2012 2073 28350
06/09/2012 6078 35648
06/09/2012 7042 27129
06/09/2012 7569 31546
06/09/2012 8239 39945
06/09/2012 8273 31107
06/09/2012 8367 27795
06/09/2012 9292 32974
06/09/2012 9532 30320
06/09/2012 9705 37462
06/09/2012 10085 31703
06/09/2012 10220 7807
06/09/2012 14573 186
07/09/2012 0 0
07/09/2012 0 0
07/09/2012 2073 28036
07/09/2012 6078 31969
07/09/2012 7569 32941
07/09/2012 8273 30073
07/09/2012 8367 29391
07/09/2012 9292 31927
07/09/2012 9532 30127
07/09/2012 9705 27604
07/09/2012 10085 28108
08/09/2012 2073 28463
10/09/2012 6078 31266
10/09/2012 8239 16390
10/09/2012 8273 31140
10/09/2012 8959 30858
10/09/2012 9532 30794
10/09/2012 9705 28752
11/09/2012 0 0
11/09/2012 0 0
11/09/2012 0 0
11/09/2012 0 0
11/09/2012 0 0
11/09/2012 2073 28159
11/09/2012 6078 36835
11/09/2012 8239 45354
11/09/2012 8273 30922
11/09/2012 8367 31382
11/09/2012 8959 29670
11/09/2012 9292 33582
11/09/2012 9705 29394
11/09/2012 10085 17140
12/09/2012 2073 28283
12/09/2012 6078 31139
12/09/2012 7042 35063
12/09/2012 8273 31075
12/09/2012 8367 29795
12/09/2012 9292 33496
12/09/2012 9532 31669
12/09/2012 9705 26166
12/09/2012 10085 29889
12/09/2012 10220 35656
13/09/2012 2073 28144
13/09/2012 6078 30544
13/09/2012 7097 30866
13/09/2012 8273 30772
13/09/2012 8367 32387
13/09/2012 8959 29307
13/09/2012 9292 32348
13/09/2012 9532 28137
13/09/2012 9705 28823
13/09/2012 10085 31543
13/09/2012 10220 28293
14/09/2012 0 12433
14/09/2012 0 12434
14/09/2012 0 12434
14/09/2012 0 12434
14/09/2012 0 12434
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 12433
14/09/2012 0 0
14/09/2012 0 12433
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 1720
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 384
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 383
14/09/2012 2073 28438
14/09/2012 6078 27255
14/09/2012 8273 29989
14/09/2012 8959 26892
14/09/2012 9292 33202
14/09/2012 9532 30862
14/09/2012 9705 26857
14/09/2012 10085 32657
14/09/2012 10220 27296
15/09/2012 6078 3832
17/09/2012 6078 30004
17/09/2012 7569 30390
17/09/2012 8239 41421
17/09/2012 8273 26337
17/09/2012 8367 31631
17/09/2012 8959 17989
17/09/2012 9292 35703
17/09/2012 9532 36542
17/09/2012 9705 27488
17/09/2012 10085 30849
17/09/2012 10220 32575
18/09/2012 2073 28293
18/09/2012 6078 27450
18/09/2012 7569 30323
18/09/2012 8239 38481
18/09/2012 8273 31154
18/09/2012 8367 27944
18/09/2012 8959 28196
18/09/2012 9292 30844
18/09/2012 9532 33128
18/09/2012 9705 32100
19/09/2012 2073 28227
19/09/2012 6078 32243
19/09/2012 7569 29041
19/09/2012 8239 42791
19/09/2012 8273 30966
19/09/2012 8367 26420
19/09/2012 8959 29394
19/09/2012 9292 14865
19/09/2012 9532 23618
19/09/2012 10085 31614
19/09/2012 10220 8686
20/09/2012 2073 28260
20/09/2012 6078 30446
20/09/2012 7097 34909
20/09/2012 7569 30869
20/09/2012 8273 31079
20/09/2012 8367 30162
20/09/2012 9292 13104
20/09/2012 9532 36614
20/09/2012 9705 35617
20/09/2012 10085 31821
20/09/2012 10220 30055
20/09/2012 14573 468
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 3
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 3
21/09/2012 2073 28308
21/09/2012 6078 33833
21/09/2012 7569 32335
21/09/2012 9292 33824
21/09/2012 9532 33376
21/09/2012 10220 21002
22/09/2012 2073 28402
23/09/2012 2073 28109
24/09/2012 2073 28431
24/09/2012 6078 30027
24/09/2012 7097 31914
24/09/2012 8239 35617
24/09/2012 8273 30670
24/09/2012 8367 29084
24/09/2012 8959 31023
24/09/2012 9292 34394
24/09/2012 9532 31255
24/09/2012 9705 18758
24/09/2012 10085 29290
24/09/2012 10220 33230
25/09/2012 2073 28506
25/09/2012 6078 32043
25/09/2012 7042 34953
25/09/2012 7569 30898
25/09/2012 8239 41297
25/09/2012 8273 31012
25/09/2012 8367 29645
25/09/2012 8959 29904
25/09/2012 9532 37875
25/09/2012 9705 13280
25/09/2012 10085 35023
25/09/2012 10220 31359
26/09/2012 2073 28388
26/09/2012 6078 29765
26/09/2012 7097 31561
26/09/2012 7569 29151
26/09/2012 8239 40369
26/09/2012 8367 28174
26/09/2012 8959 26554
26/09/2012 9292 32104
26/09/2012 9532 33194
26/09/2012 9705 30377
26/09/2012 10085 31503
26/09/2012 10220 28310
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 2073 28491
27/09/2012 6078 31137
27/09/2012 8239 38403
27/09/2012 8273 31117
27/09/2012 8367 28462
27/09/2012 9292 32387
27/09/2012 9532 23023
27/09/2012 9705 32790
27/09/2012 10085 33460
27/09/2012 10220 31782
28/09/2012 0 161
28/09/2012 2073 28381
28/09/2012 7569 32322
28/09/2012 8239 38362
28/09/2012 8273 30533
28/09/2012 8959 17128
28/09/2012 9292 32484
28/09/2012 9532 18586
28/09/2012 9705 27902
29/09/2012 2073 28583
预期输出
DateOccurred CostCentre TimeDifference
2012-03-11 0 500000
2012-03-11 2073 570000
2012-03-18 0 650000
2012-03-18 2073 425000
2012-03-25 0 378000
2012-04-25 2073 480000
答案 0 :(得分:3)
这是一种获取输入(作为文本)并按照您希望的方式对其进行分组的方法。关键是为每个分组使用字典(日期,然后是中心)。
import collections
import datetime
import functools
def delta_totals_by_date_and_centre(in_file):
# Use a defaultdict instead of a normal dict so that missing values are
# automatically created. by_date is a mapping (dict) from a tuple of (year, week)
# to another mapping (dict) from centre to total delta time.
by_date = collections.defaultdict(functools.partial(collections.defaultdict, int))
# For each line in the input...
for line in in_file:
# Parse the three fields of each line into date, int ,int.
date, centre, delta = line.split()
date = datetime.datetime.strptime(date, "%d/%m/%Y").date()
centre = int(centre)
delta = int(delta)
# Determine the year and week of the year.
year, week, weekday = date.isocalendar()
year_and_week = year, week
# Add the time delta.
by_date[year_and_week][centre] += delta
# Yield each result, in order.
for year_and_week, by_centre in sorted(by_date.items()):
for centre, delta in sorted(by_centre.items()):
yield year_and_week, centre, delta
对于您的样本输入,它会生成此输出(第一列为year-week_of_the_year
)。
2012-36 0 0
2012-36 2073 141208
2012-36 6078 171481
2012-36 7042 27129
2012-36 7569 124600
2012-36 8239 82153
2012-36 8273 154517
2012-36 8367 113339
2012-36 8959 82770
2012-36 9292 128089
2012-36 9532 137491
2012-36 9705 146321
2012-36 10085 151483
2012-36 10220 87496
2012-36 14573 186
2012-37 0 89522
2012-37 2073 113024
2012-37 6078 160871
2012-37 7042 35063
2012-37 7097 30866
2012-37 8239 61744
2012-37 8273 153898
2012-37 8367 93564
2012-37 8959 116727
2012-37 9292 132628
2012-37 9532 121462
2012-37 9705 139992
2012-37 10085 111229
2012-37 10220 91245
2012-38 0 6
2012-38 2073 169599
2012-38 6078 153976
2012-38 7097 34909
2012-38 7569 152958
2012-38 8239 122693
2012-38 8273 119536
2012-38 8367 116157
2012-38 8959 75579
2012-38 9292 128340
2012-38 9532 163278
2012-38 9705 95205
2012-38 10085 94284
2012-38 10220 92318
2012-38 14573 468
2012-39 0 161
2012-39 2073 170780
2012-39 6078 122972
2012-39 7042 34953
2012-39 7097 63475
2012-39 7569 92371
2012-39 8239 194048
2012-39 8273 123332
2012-39 8367 115365
2012-39 8959 104609
2012-39 9292 131369
2012-39 9532 143933
2012-39 9705 123107
2012-39 10085 129276
2012-39 10220 124681
答案 1 :(得分:3)
可能先按CostCentre分组,然后使用Series / DataFrame resample()
?
In [72]: centers = {}
In [73]: for center, idx in df.groupby("CostCentre").groups.iteritems():
....: timediff = df.ix[idx].set_index("Date")['TimeDifference']
....: centers[center] = timediff.resample("W", how=sum)
In [77]: pd.concat(centers, names=['CostCentre'])
Out[77]:
CostCentre Date
0 2012-09-09 0
2012-09-16 89522
2012-09-23 6
2012-09-30 161
2073 2012-09-09 141208
2012-09-16 113024
2012-09-23 169599
2012-09-30 170780
6078 2012-09-09 171481
2012-09-16 160871
2012-09-23 153976
2012-09-30 122972
其他详细信息:
当pd_read_ *函数的parse_dates
为True
时,还必须设置index_col
。
In [28]: df = pd.read_clipboard(sep=' +', parse_dates=True, index_col=0,
....: dayfirst=True)
In [30]: df.head()
Out[30]:
CostCentre TimeDifference
DateOccurred
2012-09-03 2073 28138
2012-09-03 6078 34844
2012-09-03 8273 31215
2012-09-03 8367 28160
2012-09-03 8959 32037
由于resample()需要TimeSeries索引的帧/系列,因此在创建期间设置索引无需单独设置每个组的索引。 GroupBy对象也有一个apply方法,它基本上是用上面的pd.concat()完成的“组合”步骤的语法糖。
In [37]: x = df.groupby("CostCentre").apply(lambda df:
....: df['TimeDifference'].resample("W", how=sum))
In [38]: x.head(12)
Out[38]:
CostCentre DateOccurred
0 2012-09-09 0
2012-09-16 89522
2012-09-23 6
2012-09-30 161
2073 2012-09-09 141208
2012-09-16 113024
2012-09-23 169599
2012-09-30 170780
6078 2012-09-09 171481
2012-09-16 160871
2012-09-23 153976
2012-09-30 122972