假设我有一个DataFrame:
df = pd.DataFrame({'CATEGORY':['a','b','c','b','b','a','b'],
'VALUE':[pd.np.NaN,1,0,0,5,0,4]})
看起来像
CATEGORY VALUE
0 a NaN
1 b 1
2 c 0
3 b 0
4 b 5
5 a 0
6 b 4
我把它分组:
df = df.groupby(by='CATEGORY')
现在,让我在一个小组的例子的帮助下展示我想要的东西' b':
df.get_group('b')
组b:
CATEGORY VALUE
1 b 1
3 b 0
4 b 5
6 b 4
我需要:在每个组的范围内,计算VALUE
值之间的差异(),跳过所有NaN
和0
。所以结果应该是:
CATEGORY VALUE DIFF
1 b 1 -
3 b 0 -
4 b 5 4
6 b 4 -1
答案 0 :(得分:4)
您可以在删除NaN
和df = pd.DataFrame({'CATEGORY':['a','b','c','b','b','a','b'],
'VALUE':[pd.np.NaN,1,0,0,5,0,4]})
grouped = df.groupby("CATEGORY")
# define diff func
diff = lambda x: x["VALUE"].replace(0, np.NaN).dropna().diff()
df["DIFF"] = grouped.apply(diff).reset_index(0, drop=True)
print(df)
CATEGORY VALUE DIFF
0 a NaN NaN
1 b 1.0 NaN
2 c 0.0 NaN
3 b 0.0 NaN
4 b 5.0 4.0
5 a 0.0 NaN
6 b 4.0 -1.0
值后使用diff
减去值:
search.setOnSearchClickListener(new View.OnClickListener() {
@Override
public void onClick(View v) {
//use this action
}
});
答案 1 :(得分:1)
听起来像pd.Series.shift()
操作的作业以及notnull
面具。
首先,我们在对数据进行分组之前删除不需要的值
nonull_df = df[(df['VALUE'] != 0) & df['VALUE'].notnull()]
groups = nonull_df.groupby(by='CATEGORY')
现在我们可以在组内部移动并计算差异
nonull_df['next_value'] = groups['VALUE'].shift(1)
nonull_df['diff'] = nonull_df['VALUE'] - nonull_df['next_value']
最后,您可以选择将数据复制回原始数据框
df.loc[nonull_df.index] = nonull_df
df
CATEGORY VALUE next_value diff
0 a NaN NaN NaN
1 b 1.0 NaN NaN
2 c 0.0 NaN NaN
3 b 0.0 1.0 -1.0
4 b 5.0 1.0 4.0
5 a 0.0 NaN NaN
6 b 4.0 5.0 -1.0