根据多个条件将数据与熊猫中的上一行匹配

时间:2018-11-19 17:44:03

标签: python pandas

我的熊猫df如下

id      date  time  event  prod_code 
a1      201701  11   Prodpage  101538
a1      201701  11:01 basket   101538 
b1      201701  11:19  Prodpage 109
b1      201701  11:20  basket   1

我需要创建一个与之匹配的新列,请在下面找到一个伪代码

df[matched] = if (value of id in 1st row = value of id in 2nd row, & \
              if date[1] = date[2] & event[1] = "prodpage", \
              event[2] = "basket" & prod_code[1] = prod_code[2] ,\
              "then matched" otherwise unmatched 

所以输出应该是

id      date  time  event  prod_code   matched?
a1      201701  11   Prodpage  101538   
a1      201701  11:01 basket   101538   Matched 
b1      201701  11:19  Prodpage 109
b1      201701  11:20  basket   1       Not Matched 

如何在熊猫中实现这一目标

1 个答案:

答案 0 :(得分:3)

将逻辑分解为多个部分,最后结合多个布尔条件。例如,如果每个id始终有2个值并且它们相邻:

match_cols = ['id', 'date', 'prod_code']

m1 = df[match_cols] == df[match_cols].shift()
m2 = df['event'] == 'basket'
m3 = df['event'].shift() == 'Prodpage'

df['matched?'] = np.where(m1.all(1) & m2 & m3, 'matched', 'not matched')
df.loc[::2, 'matched?'] = ''

print(df)

   id    date   time     event  prod_code     matched?
0  a1  201701     11  Prodpage     101538             
1  a1  201701  11:01    basket     101538      matched
2  b1  201701  11:19  Prodpage        109             
3  b1  201701  11:20    basket          1  not matched