python pandas check列包含列表中的项

时间:2018-01-29 08:52:25

标签: python list pandas merge extract

我有两个数据框,比如

vid     vbull   
1125    RHSA:2017:3200   
1127    RHSA:2017:3205  
1128    RHSA:2017:3208   
1129    RHSA:2017:3209


kbid    vdesc   
2401    This contains details for RHSA:2017:3205   
2402    This contains details for RHSA:2017:3206   
2403    This contains details forRHSA:2017:3207   
2404    This contains details for RHSA:2017:3208  
2405    This contains details for RHSA:2017:3200

需要输出df1,df2以匹配vdesc中的vbull,如:

vid   vbull           kbid   vdesc   
1125  RHSA:2017:3200  2405   This contains details for RHSA:2017:3200   
1127  RHSA:2017:3207  2403  This contains details for RHSA:2017:3207   ...

试图获取匹配的项目但不确定如何在输出

中获取匹配的项目
df2[df2.vdesc.str.contains('|'.join(df1.vbull))]    

1 个答案:

答案 0 :(得分:0)

首先使用extract来代替vbull的值:

df2['extracted'] = df2.vdesc.str.extract('(' + '|'.join(df1.vbull) + ')', expand=False)
print (df2)
   kbid                                     vdesc       extracted
0  2401  This contains details for RHSA:2017:3205  RHSA:2017:3205
1  2402  This contains details for RHSA:2017:3206             NaN
2  2403  This contains details for RHSA:2017:3207             NaN
3  2404  This contains details for RHSA:2017:3208  RHSA:2017:3208
4  2405  This contains details for RHSA:2017:3200  RHSA:2017:3200

然后按boolean indexing过滤:

df3 = df2[df2['extracted'].notnull()].copy()
print (df3)
   kbid                                     vdesc       extracted
0  2401  This contains details for RHSA:2017:3205  RHSA:2017:3205
3  2404  This contains details for RHSA:2017:3208  RHSA:2017:3208
4  2405  This contains details for RHSA:2017:3200  RHSA:2017:3200

最后按map添加vid的值:

df3['new'] = df3['extracted'].map(df1.set_index('vbull')['vid'])
print (df3)
   kbid                                     vdesc       extracted   new
0  2401  This contains details for RHSA:2017:3205  RHSA:2017:3205  1127
3  2404  This contains details for RHSA:2017:3208  RHSA:2017:3208  1128
4  2405  This contains details for RHSA:2017:3200  RHSA:2017:3200  1125