我的熊猫df如下
id date time event prod_code
a1 201701 11 Prodpage 101538
a1 201701 11:01 basket 101538
b1 201701 11:19 Prodpage 109
b1 201701 11:20 basket 1
我需要创建一个与之匹配的新列,请在下面找到一个伪代码
df[matched] = if (value of id in 1st row = value of id in 2nd row, & \
if date[1] = date[2] & event[1] = "prodpage", \
event[2] = "basket" & prod_code[1] = prod_code[2] ,\
"then matched" otherwise unmatched
所以输出应该是
id date time event prod_code matched?
a1 201701 11 Prodpage 101538
a1 201701 11:01 basket 101538 Matched
b1 201701 11:19 Prodpage 109
b1 201701 11:20 basket 1 Not Matched
如何在熊猫中实现这一目标
答案 0 :(得分:3)
将逻辑分解为多个部分,最后结合多个布尔条件。例如,如果每个id
始终有2个值并且它们相邻:
match_cols = ['id', 'date', 'prod_code']
m1 = df[match_cols] == df[match_cols].shift()
m2 = df['event'] == 'basket'
m3 = df['event'].shift() == 'Prodpage'
df['matched?'] = np.where(m1.all(1) & m2 & m3, 'matched', 'not matched')
df.loc[::2, 'matched?'] = ''
print(df)
id date time event prod_code matched?
0 a1 201701 11 Prodpage 101538
1 a1 201701 11:01 basket 101538 matched
2 b1 201701 11:19 Prodpage 109
3 b1 201701 11:20 basket 1 not matched