Question

我有一个大熊猫时间序列，其中包含累计月度值。

如果在特定日期的某个月内，该值变为某个数字，我需要将其余时间设置为1000.

E.g。

df:

 Date       cummulative_value
1/8/2017    -3
1/9/2017    -6
1/10/2017   -72
1/11/2017   500
1/26/2017   575
2/7/2017    -5
2/14/2017   -6
2/21/2017   -6

我的截止值是-71所以在上面的示例中我需要实现以下内容：

 Date       cummulative_value
1/8/2017    -3
1/9/2017    -6
1/10/2017   1000
1/11/2017   1000
1/26/2017   1000
2/7/2017    -5
2/14/2017   -6
2/21/2017   -6

我正在尝试在熊猫中使用groupby，但我不知道如何去做。任何其他更有效的方式也会有所帮助。

Answer 1

使用groupby和cumprod：

df['cummulative_value'] = (df.groupby(df['Date'].dt.strftime('%Y%m'))['cummulative_value']
                            .transform(lambda x: np.where(x.ge(-71).cumprod(),x,1000)))
print(df)

输出：

        Date  cummulative_value
0 2017-01-08                 -3
1 2017-01-09                 -6
2 2017-01-10               1000
3 2017-01-11               1000
4 2017-01-26               1000
5 2017-02-07                 -5
6 2017-02-14                 -6
7 2017-02-21                 -6

Answer 2

这是一种涉及创建蒙版的方法：

df.set_index(pd.to_datetime(df['Date'], format="%m/%d/%Y"), inplace=True)

mask = df['cummulative_value'].lt(-71).groupby(df.index.month).cumsum()

# Date
# 2017-01-08    False
# 2017-01-09    False
# 2017-01-10     True
# 2017-01-11     True
# 2017-01-26     True
# 2017-02-07    False
# 2017-02-14    False
# 2017-02-21    False

df.loc[mask, 'cummulative_value'] = 1000

df.reset_index(drop=True)

#         Date  cummulative_value
# 0   1/8/2017                 -3
# 1   1/9/2017                 -6
# 2  1/10/2017               1000
# 3  1/11/2017               1000
# 4  1/26/2017               1000
# 5   2/7/2017                 -5
# 6  2/14/2017                 -6
# 7  2/21/2017                 -6

达到阈值后，将累计值设置为常量

2 个答案: