问题描述:我基本上有一个包含2列的数据框-Member_ID和Service_from。我正在尝试计算每个ID的服务日期之间的日差的运行总和。成员的累计总和达到365天后,天差应重置为0。
这是我的数据框的样子:
Member_ID Service_from Service_year Diff running_total Desired_total
540 2/1/2016 2016 0 days 00:00:00.000000000 0 days 00:00:00.000000000 0 days 00:00:00.000000000
540 3/29/2016 2016 57 days 00:00:00.000000000 57 days 00:00:00.000000000 57 days 00:00:00.000000000
540 4/26/2016 2016 28 days 00:00:00.000000000 85 days 00:00:00.000000000 85 days 00:00:00.000000000
540 5/27/2016 2016 31 days 00:00:00.000000000 116 days 00:00:00.000000000 116 days 00:00:00.000000000
540 7/1/2016 2016 35 days 00:00:00.000000000 151 days 00:00:00.000000000 151 days 00:00:00.000000000
540 8/5/2016 2016 35 days 00:00:00.000000000 186 days 00:00:00.000000000 186 days 00:00:00.000000000
540 9/13/2016 2016 39 days 00:00:00.000000000 225 days 00:00:00.000000000 225 days 00:00:00.000000000
540 10/25/2016 2016 42 days 00:00:00.000000000 267 days 00:00:00.000000000 267 days 00:00:00.000000000
540 11/22/2016 2016 28 days 00:00:00.000000000 295 days 00:00:00.000000000 295 days 00:00:00.000000000
540 12/27/2016 2016 35 days 00:00:00.000000000 330 days 00:00:00.000000000 330 days 00:00:00.000000000
540 1/24/2017 2017 28 days 00:00:00.000000000 358 days 00:00:00.000000000 358 days 00:00:00.000000000
540 2/21/2017 2017 28 days 00:00:00.000000000 386 days 00:00:00.000000000 0 days 00:00:00.000000000
540 4/11/2017 2017 49 days 00:00:00.000000000 435 days 00:00:00.000000000 77
540 4/26/2017 2017 15 days 00:00:00.000000000 450 days 00:00:00.000000000 92
540 4/26/2017 2017 0 days 00:00:00.000000000 450 days 00:00:00.000000000 92
540 5/1/2017 2017 5 days 00:00:00.000000000 455 days 00:00:00.000000000 97
540 5/1/2017 2017 0 days 00:00:00.000000000 455 days 00:00:00.000000000 97
25 9/26/2017 2017 0 days 00:00:00.000000000 0 days 00:00:00.000000000 0 days 00:00:00.000000000
25 11/26/2017 2017 61 days 00:00:00.000000000 61 days 00:00:00.000000000 61 days 00:00:00.000000000
我可以使用以下代码将每个成员的差异重置为0:
sts['diff'] = sts['service_from'].diff()
mask = sts.Member_ID != sts.Member_ID.shift(1)
sts['diff'][mask] = np.nan
sts['diff'].fillna(0, inplace=True)
但是当累计达到365天时,我无法重置。
有人可以帮忙吗?
答案 0 :(得分:1)
示例数据设置如下
df = pd.DataFrame({'Member_ID': [540, 540, 540, 540, 540, 540, 540, 540, 540, 540, 540, 25, 25, 25, 25, 25, 25, 25, 25],
'Service_from': ['2/1/2016', '3/29/2016', '4/26/2016', '9/13/2016', '10/25/2016', '11/22/2016', '12/27/2016', '1/24/2017', '4/26/2017',
'2/21/2017', '4/11/2017', '4/26/2017', '5/1/2017', '5/1/2017', '9/26/2017', '11/26/2017', '1/24/2018', '4/26/2018', '9/5/2018']})
df['Service_from'] = pd.to_datetime(df['Service_from'])
结果
Member_ID Service_from
0 540 2016-02-01
1 540 2016-03-29
2 540 2016-04-26
3 540 2016-09-13
4 540 2016-10-25
5 540 2016-11-22
6 540 2016-12-27
7 540 2017-01-24
8 540 2017-04-26
9 540 2017-02-21
10 540 2017-04-11
11 25 2017-04-26
12 25 2017-05-01
13 25 2017-05-01
14 25 2017-09-26
15 25 2017-11-26
16 25 2018-01-24
17 25 2018-04-26
18 25 2018-09-05
你可以做
# Sort the data frame for convenience
df.sort_values(by=['Member_ID', 'Service_from'], inplace=True)
# Get the minimum Service_from per Member_ID
df.set_index('Member_ID', inplace=True)
df['Service_from_min'] = df.groupby(df.index)['Service_from'].min()
# Get the difference in days modulo 365
df['Diff'] = (df['Service_from'] - df['Service_from_min']).dt.days % 365
df.reset_index(inplace = True)
应该给你
Member_ID Service_from Service_from_min Diff
0 25 2017-04-26 2017-04-26 0
1 25 2017-05-01 2017-04-26 5
2 25 2017-05-01 2017-04-26 5
3 25 2017-09-26 2017-04-26 153
4 25 2017-11-26 2017-04-26 214
5 25 2018-01-24 2017-04-26 273
6 25 2018-04-26 2017-04-26 0
7 25 2018-09-05 2017-04-26 132
8 540 2016-02-01 2016-02-01 0
9 540 2016-03-29 2016-02-01 57
10 540 2016-04-26 2016-02-01 85
11 540 2016-09-13 2016-02-01 225
12 540 2016-10-25 2016-02-01 267
13 540 2016-11-22 2016-02-01 295
14 540 2016-12-27 2016-02-01 330
15 540 2017-01-24 2016-02-01 358
16 540 2017-02-21 2016-02-01 21
17 540 2017-04-11 2016-02-01 70
18 540 2017-04-26 2016-02-01 85
如果不需要,您也可以删除Service_from_min
列
df.drop('Service_from_min', axis=1, inplace=True)