所有,我无法理解如何使用groupby来解决这一挑战,因为我看到的大多数组合示例显然都没有处理非连续值的区别。
Timestamp 'Signal' 'Value
00:00:00 1 12
00:00:01 1 12.2
00:00:02 1 2.1
00:00:03 0 1.1
00:00:04 1 6.2
00:00:05 1 1.0
00:00:06 0 4.4
00:00:07 0 1.6
我想取最后一个值,在另一种情况下,取前三行的总和,因为信号为1.我想再重新开始使用新的和/最后两行,因为信号是1.
这样的事情:
Timestamp Signal Value Sum Last
00:00:00 1 12
00:00:01 1 12.2
00:00:02 1 2.1 26.3 2.1
00:00:03 0 1.1
00:00:04 1 6.2
00:00:05 1 1.0 7.2 1.0
00:00:06 0 4.4
00:00:07 0 1.6
提前致谢!
答案 0 :(得分:1)
首先需要cumsum
shift
移位列A
duplicated
创建a = df['Signal'].ne(df['Signal'].shift()).cumsum()
print (a)
0 1
1 1
2 1
3 2
4 3
5 3
6 4
7 4
Name: Signal, dtype: int32
:
Signal
然后按groupby
与0
列链接的值获取值,这些值将转换为False
到1
以及True
到{{ 1}} S:
m = ~a.duplicated(keep='last') & df['Signal']
print (m)
0 False
1 False
2 True
3 False
4 False
5 True
6 False
7 False
Name: Signal, dtype: bool
系列的最后transform
和where
sum
以及{{3}}的最后添加NaN
:
df['Sum'] = df.groupby(a)['Value'].transform('sum')
df['Last'] = df['Value']
df[['Sum','Last']] = df[['Sum','Last']].where(m)
print (df)
Timestamp Signal Value Sum Last
0 00:00:00 1 12.0 NaN NaN
1 00:00:01 1 12.2 NaN NaN
2 00:00:02 1 2.1 26.3 2.1
3 00:00:03 0 1.1 NaN NaN
4 00:00:04 1 6.2 NaN NaN
5 00:00:05 1 1.0 7.2 1.0
6 00:00:03 0 4.4 NaN NaN
7 00:00:03 0 1.6 NaN NaN