我需要对分组的数据帧执行累加和,但是当前一个值为负而当前值为正时,我需要重置它。
在R中,我可以使用ave()函数将条件应用于groupby,但是在python中无法做到这一点,因此在考虑解决方案时遇到了一些麻烦。谁能帮我吗?
以下是示例:
import pandas as pd
df = pd.DataFrame({'PRODUCT': ['A'] * 40, 'GROUP': ['1'] * 40, 'FORECAST': [100, -40, -40, -40]*10, })
df['CS'] = df.groupby(['GROUP', 'PRODUCT']).FORECAST.cumsum()
# Reset cumsum if
# condition: (df.FORECAST > 0) & (df.groupby(['GROUP', 'PRODUCT']).FORECAST.shift(-1).fillna(0) <= 0)
答案 0 :(得分:1)
对于要求和的值从负变为正的任何示例,此解决方案都将重置其总和(无论数据集是否像示例中的那样是周期性的)
import numpy as np
import pandas as pd
df = pd.DataFrame({'PRODUCT': ['A'] * 40, 'GROUP': ['1'] * 40, 'FORECAST': [100, -40, -40, -40]*10, })
cumsum = np.cumsum(df['FORECAST'])
# Array of indices where sum should be reset
reset_ind = np.where(df['FORECAST'].diff() > 0)[0]
# Sums that need to be subtracted at resets
subs = cumsum[reset_ind-1].values
# Repeat subtraction values for every entry BETWEEN resets and values after final reset
rep_subs = np.repeat(subs, np.hstack([np.diff(reset_ind), df['FORECAST'].size - reset_ind[-1]]))
# Stack together values before first reset and resetted sums
df['CS'] = np.hstack([cumsum[:reset_ind[0]], cumsum[reset_ind[0]:] - rep_subs])
或者,基于on this solution to a similar question(以及我对groupby
有用性的认识)
import pandas as pd
import numpy as np
df = pd.DataFrame({'PRODUCT': ['A'] * 40, 'GROUP': ['1'] * 40, 'FORECAST': [100, -40, -40, -40]*10, })
# Create indices to group sums together
df['cumsum'] = (df['FORECAST'].diff() > 0).cumsum()
# Perform group-wise cumsum
df['CS'] = df.groupby(['cumsum'])['FORECAST'].cumsum()
# Remove intermediary cumsum column
df = df.drop(['cumsum'], axis=1)