Question

我有一个Pandas DataFrame，并想根据以下逻辑创建一个新列

给出一个user_id
检查上一行和当前行中的事件字段是否包含相同的值
检查两个事件之间的时间差是否小于阈值

我有以下代码

def calc_prev(row):    
    row_minus_1 = row.shift(1)
    row_minus_2 = row.shift(2)
    return ((row.event_type == 'roll') and 
            (row.event_type == row_minus_1.event_type == row_minus_2.event_type) and
            (row.ts - row_minus_1.ts < 500) and (row_minus_1.ts - row_minus_2.ts < 500))



df['new_metrics'] = df.groupby(['user_id']).apply(calc_prev)

对于条件为真的每一行，我希望'new_metrics'列包含1。上面的代码给我这个错误：

ValueError：系列的真值不明确。使用空 a.bool（），a.item（），a.any（）或a.all（）。

    user_id event   ts  new_metrics
0   1   "start" 1531827851982   0
1   2   "end"   1531827852082   0
2   3   "start" 1531827852182   0
3   3   "start" 1531827852282   0
4   3   "start" 1531827852382   1

从前两行的值导出列

0 个答案: