将标记项的组求和,然后找到最大值

时间:2019-05-20 15:31:06

标签: python pandas numpy itertools

我想对按正向和负向流动分组的值求和,然后进行比较以找出最大的负向和最大正向流动。

我认为itertools可能是实现此目的的方法,但无法弄清楚。

#create a data frame that shows week and value
n_rows = 30
dftest = pd.DataFrame({'week': pd.date_range('1/4/2019', periods=n_rows, freq='W'),
                      'value': np.random.randint(-100,100,size=(n_rows))})

#flag positives and negatives
def flowFinder(row):
    if row['value'] > 0:
        return "Positive"
    else:
        return "Negative"
dftest['flag'] = dftest.apply(flowFinder,axis=1)
dftest

In this example df,您将确定15-19的总和为249,这是所有正向流量的最大值。最大负流量是第5行的-98。

由Scott Boston编辑 最好添加代码来生成数据框,而不是链接到图片。

df = pd.DataFrame({'week':pd.date_range('2019-01-06',periods=21, freq='W'), 
                   'value':[64,43,94,-19,3,-98,1,80,-7,-43,45,58,27,29,
                            -4,20,97,30,22,80,-95],
                   'flag':['Positive']*3+['Negative']+['Positive']+['Negative']+
                           ['Positive']*2+['Negative']*2+['Positive']*4+
                           ['Negative']+['Positive']*5+['Negative']})

1 个答案:

答案 0 :(得分:1)

您可以尝试以下方法:

df.groupby((df['flag'] != df['flag'].shift()).cumsum())['value'].sum().agg(['min','max'])

输出:

min    -98
max    249
Name: value, dtype: int64

使用重命名:

df.groupby((df['flag'] != df['flag'].shift()).cumsum())['value'].sum().agg(['min','max'])\
  .rename(index={'min':'Negative','max':'Positive'})

输出:

Negative    -98
Positive    249
Name: value, dtype: int64

更新答案评论:

df_out = df.groupby((df['flag'] != df['flag'].shift()).cumsum())['value','week']\
           .agg({'value':'sum','week':'last'})
df_out.loc[df_out.agg({'value':['idxmin','idxmax']}).squeeze().tolist()]

输出:

      value       week
flag                  
4       -98 2019-02-10
9       249 2019-05-19