汇总和汇总pandas数据,但在

时间:2017-07-21 13:15:43

标签: pandas pandas-groupby

所有,我无法理解如何使用groupby来解决这一挑战,因为我看到的大多数组合示例显然都没有处理非连续值的区别。

Timestamp 'Signal' 'Value
00:00:00     1        12
00:00:01     1        12.2
00:00:02     1        2.1
00:00:03     0        1.1
00:00:04     1        6.2
00:00:05     1        1.0
00:00:06     0        4.4
00:00:07     0        1.6

我想取最后一个值,在另一种情况下,取前三行的总和,因为信号为1.我想再重新开始使用新的和/最后两行,因为信号是1.

这样的事情:

Timestamp Signal Value Sum Last
00:00:00     1   12    
00:00:01     1   12.2
00:00:02     1   2.1   26.3 2.1
00:00:03     0   1.1
00:00:04     1   6.2
00:00:05     1   1.0    7.2  1.0
00:00:06     0   4.4
00:00:07     0   1.6

提前致谢!

1 个答案:

答案 0 :(得分:1)

首先需要cumsum shift移位列A duplicated创建a = df['Signal'].ne(df['Signal'].shift()).cumsum() print (a) 0 1 1 1 2 1 3 2 4 3 5 3 6 4 7 4 Name: Signal, dtype: int32

Signal

然后按groupby0列链接的值获取值,这些值将转换为False1以及True到{{ 1}} S:

m = ~a.duplicated(keep='last') & df['Signal'] 
print (m)
0    False
1    False
2     True
3    False
4    False
5     True
6    False
7    False
Name: Signal, dtype: bool

系列的最后transformwhere sum以及{{3}}的最后添加NaN

df['Sum'] = df.groupby(a)['Value'].transform('sum')
df['Last'] = df['Value']
df[['Sum','Last']] = df[['Sum','Last']].where(m)
print (df)
  Timestamp  Signal  Value   Sum  Last
0  00:00:00       1   12.0   NaN   NaN
1  00:00:01       1   12.2   NaN   NaN
2  00:00:02       1    2.1  26.3   2.1
3  00:00:03       0    1.1   NaN   NaN
4  00:00:04       1    6.2   NaN   NaN
5  00:00:05       1    1.0   7.2   1.0
6  00:00:03       0    4.4   NaN   NaN
7  00:00:03       0    1.6   NaN   NaN