我需要计算delta列(如下所示)。但棘手的部分是下面提到的条件。我怎么能在熊猫中做到这一点?
speaker | video | frame | time |delta(expected) --------|-------|-------|------|---------------- one |1 | 0 |10 |0 one |1 | 1 |15 |5 one |2 | 0 |12 |0 one |2 | 1 |16 |4 two |2 | 0 |19 |0 two |2 | 1 |22 |3 two |2 | 2 |16 |-6
条件: Delta是具有相同视频的相同扬声器的帧之间的差异。换句话说,不应针对不同的扬声器或不同的视频在行上计算delta。对于这些情况,该值应初始化为零,如delta(预期)列中所示。
答案 0 :(得分:3)
不要使用groupby
,diff
和fillna
:
df['delta'] = df.groupby(['speaker','video'])['time'].diff().fillna(0)
输出:
speaker video frame time delta(expected) delta
0 one 1 0 10 0 0.0
1 one 1 1 15 5 5.0
2 one 2 0 12 0 0.0
3 one 2 1 16 4 4.0
4 two 2 0 19 0 0.0
5 two 2 1 22 3 3.0
6 two 2 2 16 -6 -6.0
答案 1 :(得分:3)
选项1
假设df
按['speaker', 'video']
排序。如果没有,那就这样做。
delta = np.where(
df.duplicated(['speaker', 'video']).values,
np.append(0, np.diff(df.time.values)), 0
)
df.assign(delta=delta)
speaker video frame time delta(expected) delta
0 one 1 0 10 0 0
1 one 1 1 15 5 5
2 one 2 0 12 0 0
3 one 2 1 16 4 4
4 two 2 0 19 0 0
5 two 2 1 22 3 3
6 two 2 2 16 -6 -6
选项2
df.assign(
delta=df.groupby(['speaker', 'video']).time.transform(
lambda x: np.append(0, np.diff(x.values))
)
)
speaker video frame time delta(expected) delta
0 one 1 0 10 0 0
1 one 1 1 15 5 5
2 one 2 0 12 0 0
3 one 2 1 16 4 4
4 two 2 0 19 0 0
5 two 2 1 22 3 3
6 two 2 2 16 -6 -6