我想对按正向和负向流动分组的值求和,然后进行比较以找出最大的负向和最大正向流动。
我认为itertools可能是实现此目的的方法,但无法弄清楚。
#create a data frame that shows week and value
n_rows = 30
dftest = pd.DataFrame({'week': pd.date_range('1/4/2019', periods=n_rows, freq='W'),
'value': np.random.randint(-100,100,size=(n_rows))})
#flag positives and negatives
def flowFinder(row):
if row['value'] > 0:
return "Positive"
else:
return "Negative"
dftest['flag'] = dftest.apply(flowFinder,axis=1)
dftest
In this example df,您将确定15-19的总和为249,这是所有正向流量的最大值。最大负流量是第5行的-98。
由Scott Boston编辑 最好添加代码来生成数据框,而不是链接到图片。
df = pd.DataFrame({'week':pd.date_range('2019-01-06',periods=21, freq='W'),
'value':[64,43,94,-19,3,-98,1,80,-7,-43,45,58,27,29,
-4,20,97,30,22,80,-95],
'flag':['Positive']*3+['Negative']+['Positive']+['Negative']+
['Positive']*2+['Negative']*2+['Positive']*4+
['Negative']+['Positive']*5+['Negative']})
答案 0 :(得分:1)
您可以尝试以下方法:
df.groupby((df['flag'] != df['flag'].shift()).cumsum())['value'].sum().agg(['min','max'])
输出:
min -98
max 249
Name: value, dtype: int64
使用重命名:
df.groupby((df['flag'] != df['flag'].shift()).cumsum())['value'].sum().agg(['min','max'])\
.rename(index={'min':'Negative','max':'Positive'})
输出:
Negative -98
Positive 249
Name: value, dtype: int64
更新答案评论:
df_out = df.groupby((df['flag'] != df['flag'].shift()).cumsum())['value','week']\
.agg({'value':'sum','week':'last'})
df_out.loc[df_out.agg({'value':['idxmin','idxmax']}).squeeze().tolist()]
输出:
value week
flag
4 -98 2019-02-10
9 249 2019-05-19