当总和超过365天时,熊猫的总和重置并按ID分组

时间:2018-12-04 17:44:07

标签: python pandas

问题描述:我基本上有一个包含2列的数据框-Member_ID和Service_from。我正在尝试计算每个ID的服务日期之间的日差的运行总和。成员的累计总和达到365天后,天差应重置为0。

这是我的数据框的样子:

Member_ID   Service_from    Service_year    Diff    running_total   Desired_total
540 2/1/2016    2016    0 days 00:00:00.000000000   0 days 00:00:00.000000000   0 days 00:00:00.000000000
540 3/29/2016   2016    57 days 00:00:00.000000000  57 days 00:00:00.000000000  57 days 00:00:00.000000000
540 4/26/2016   2016    28 days 00:00:00.000000000  85 days 00:00:00.000000000  85 days 00:00:00.000000000
540 5/27/2016   2016    31 days 00:00:00.000000000  116 days 00:00:00.000000000 116 days 00:00:00.000000000
540 7/1/2016    2016    35 days 00:00:00.000000000  151 days 00:00:00.000000000 151 days 00:00:00.000000000
540 8/5/2016    2016    35 days 00:00:00.000000000  186 days 00:00:00.000000000 186 days 00:00:00.000000000
540 9/13/2016   2016    39 days 00:00:00.000000000  225 days 00:00:00.000000000 225 days 00:00:00.000000000
540 10/25/2016  2016    42 days 00:00:00.000000000  267 days 00:00:00.000000000 267 days 00:00:00.000000000
540 11/22/2016  2016    28 days 00:00:00.000000000  295 days 00:00:00.000000000 295 days 00:00:00.000000000
540 12/27/2016  2016    35 days 00:00:00.000000000  330 days 00:00:00.000000000 330 days 00:00:00.000000000
540 1/24/2017   2017    28 days 00:00:00.000000000  358 days 00:00:00.000000000 358 days 00:00:00.000000000
540 2/21/2017   2017    28 days 00:00:00.000000000  386 days 00:00:00.000000000 0 days 00:00:00.000000000
540 4/11/2017   2017    49 days 00:00:00.000000000  435 days 00:00:00.000000000 77
540 4/26/2017   2017    15 days 00:00:00.000000000  450 days 00:00:00.000000000 92
540 4/26/2017   2017    0 days 00:00:00.000000000   450 days 00:00:00.000000000 92
540 5/1/2017    2017    5 days 00:00:00.000000000   455 days 00:00:00.000000000 97
540 5/1/2017    2017    0 days 00:00:00.000000000   455 days 00:00:00.000000000 97
25  9/26/2017   2017    0 days 00:00:00.000000000   0 days 00:00:00.000000000   0 days 00:00:00.000000000
25  11/26/2017  2017    61 days 00:00:00.000000000  61 days 00:00:00.000000000  61 days 00:00:00.000000000

我可以使用以下代码将每个成员的差异重置为0:

sts['diff'] = sts['service_from'].diff()
mask = sts.Member_ID != sts.Member_ID.shift(1)
sts['diff'][mask] = np.nan
sts['diff'].fillna(0, inplace=True)

但是当累计达到365天时,我无法重置。

有人可以帮忙吗?

1 个答案:

答案 0 :(得分:1)

示例数据设置如下

df = pd.DataFrame({'Member_ID': [540, 540, 540, 540, 540, 540, 540, 540, 540, 540, 540, 25, 25, 25, 25, 25, 25, 25, 25],
                   'Service_from': ['2/1/2016', '3/29/2016', '4/26/2016', '9/13/2016', '10/25/2016', '11/22/2016', '12/27/2016', '1/24/2017', '4/26/2017',
                                    '2/21/2017', '4/11/2017', '4/26/2017', '5/1/2017', '5/1/2017', '9/26/2017', '11/26/2017', '1/24/2018', '4/26/2018', '9/5/2018']})
df['Service_from'] = pd.to_datetime(df['Service_from'])

结果

    Member_ID Service_from
0         540   2016-02-01
1         540   2016-03-29
2         540   2016-04-26
3         540   2016-09-13
4         540   2016-10-25
5         540   2016-11-22
6         540   2016-12-27
7         540   2017-01-24
8         540   2017-04-26
9         540   2017-02-21
10        540   2017-04-11
11         25   2017-04-26
12         25   2017-05-01
13         25   2017-05-01
14         25   2017-09-26
15         25   2017-11-26
16         25   2018-01-24
17         25   2018-04-26
18         25   2018-09-05

你可以做

# Sort the data frame for convenience
df.sort_values(by=['Member_ID', 'Service_from'], inplace=True)

# Get the minimum Service_from per Member_ID
df.set_index('Member_ID', inplace=True)
df['Service_from_min'] = df.groupby(df.index)['Service_from'].min()

# Get the difference in days modulo 365
df['Diff'] = (df['Service_from'] - df['Service_from_min']).dt.days % 365
df.reset_index(inplace = True)

应该给你

    Member_ID Service_from Service_from_min  Diff
0          25   2017-04-26       2017-04-26     0
1          25   2017-05-01       2017-04-26     5
2          25   2017-05-01       2017-04-26     5
3          25   2017-09-26       2017-04-26   153
4          25   2017-11-26       2017-04-26   214
5          25   2018-01-24       2017-04-26   273
6          25   2018-04-26       2017-04-26     0
7          25   2018-09-05       2017-04-26   132
8         540   2016-02-01       2016-02-01     0
9         540   2016-03-29       2016-02-01    57
10        540   2016-04-26       2016-02-01    85
11        540   2016-09-13       2016-02-01   225
12        540   2016-10-25       2016-02-01   267
13        540   2016-11-22       2016-02-01   295
14        540   2016-12-27       2016-02-01   330
15        540   2017-01-24       2016-02-01   358
16        540   2017-02-21       2016-02-01    21
17        540   2017-04-11       2016-02-01    70
18        540   2017-04-26       2016-02-01    85

如果不需要,您也可以删除Service_from_min

df.drop('Service_from_min', axis=1, inplace=True)