条件上的DataFrameGroupBy diff()

时间:2017-03-31 12:15:23

标签: python pandas dataframe

假设我有一个DataFrame:

df = pd.DataFrame({'CATEGORY':['a','b','c','b','b','a','b'],
                   'VALUE':[pd.np.NaN,1,0,0,5,0,4]})

看起来像

    CATEGORY    VALUE
0      a         NaN
1      b         1
2      c         0
3      b         0
4      b         5
5      a         0
6      b         4

我把它分组:

df = df.groupby(by='CATEGORY')

现在,让我在一个小组的例子的帮助下展示我想要的东西' b':

df.get_group('b')

组b:

    CATEGORY    VALUE
1      b          1
3      b          0
4      b          5
6      b          4

我需要:在每个组的范围内,计算VALUE值之间的差异(),跳过所有NaN0。所以结果应该是:

    CATEGORY    VALUE  DIFF
1      b          1      - 
3      b          0      -
4      b          5      4
6      b          4     -1

2 个答案:

答案 0 :(得分:4)

您可以在删除NaNdf = pd.DataFrame({'CATEGORY':['a','b','c','b','b','a','b'], 'VALUE':[pd.np.NaN,1,0,0,5,0,4]}) grouped = df.groupby("CATEGORY") # define diff func diff = lambda x: x["VALUE"].replace(0, np.NaN).dropna().diff() df["DIFF"] = grouped.apply(diff).reset_index(0, drop=True) print(df) CATEGORY VALUE DIFF 0 a NaN NaN 1 b 1.0 NaN 2 c 0.0 NaN 3 b 0.0 NaN 4 b 5.0 4.0 5 a 0.0 NaN 6 b 4.0 -1.0 值后使用diff减去值:

search.setOnSearchClickListener(new View.OnClickListener() {
        @Override
        public void onClick(View v) {
            //use this action
        }
    });

答案 1 :(得分:1)

听起来像pd.Series.shift()操作的作业以及notnull面具。

首先,我们在对数据进行分组之前删除不需要的值

nonull_df = df[(df['VALUE'] != 0) & df['VALUE'].notnull()]
groups = nonull_df.groupby(by='CATEGORY')

现在我们可以在组内部移动并计算差异

nonull_df['next_value'] = groups['VALUE'].shift(1)
nonull_df['diff'] = nonull_df['VALUE'] - nonull_df['next_value']

最后,您可以选择将数据复制回原始数据框

df.loc[nonull_df.index] = nonull_df

df
  CATEGORY  VALUE  next_value  diff
0        a    NaN         NaN   NaN
1        b    1.0         NaN   NaN
2        c    0.0         NaN   NaN
3        b    0.0         1.0  -1.0
4        b    5.0         1.0   4.0
5        a    0.0         NaN   NaN
6        b    4.0         5.0  -1.0