我有两个数据框,比如
vid vbull
1125 RHSA:2017:3200
1127 RHSA:2017:3205
1128 RHSA:2017:3208
1129 RHSA:2017:3209
kbid vdesc
2401 This contains details for RHSA:2017:3205
2402 This contains details for RHSA:2017:3206
2403 This contains details forRHSA:2017:3207
2404 This contains details for RHSA:2017:3208
2405 This contains details for RHSA:2017:3200
需要输出df1,df2以匹配vdesc中的vbull,如:
vid vbull kbid vdesc
1125 RHSA:2017:3200 2405 This contains details for RHSA:2017:3200
1127 RHSA:2017:3207 2403 This contains details for RHSA:2017:3207 ...
试图获取匹配的项目但不确定如何在输出
中获取匹配的项目df2[df2.vdesc.str.contains('|'.join(df1.vbull))]
答案 0 :(得分:0)
首先使用extract
来代替vbull
的值:
df2['extracted'] = df2.vdesc.str.extract('(' + '|'.join(df1.vbull) + ')', expand=False)
print (df2)
kbid vdesc extracted
0 2401 This contains details for RHSA:2017:3205 RHSA:2017:3205
1 2402 This contains details for RHSA:2017:3206 NaN
2 2403 This contains details for RHSA:2017:3207 NaN
3 2404 This contains details for RHSA:2017:3208 RHSA:2017:3208
4 2405 This contains details for RHSA:2017:3200 RHSA:2017:3200
然后按boolean indexing
过滤:
df3 = df2[df2['extracted'].notnull()].copy()
print (df3)
kbid vdesc extracted
0 2401 This contains details for RHSA:2017:3205 RHSA:2017:3205
3 2404 This contains details for RHSA:2017:3208 RHSA:2017:3208
4 2405 This contains details for RHSA:2017:3200 RHSA:2017:3200
最后按map
添加vid
的值:
df3['new'] = df3['extracted'].map(df1.set_index('vbull')['vid'])
print (df3)
kbid vdesc extracted new
0 2401 This contains details for RHSA:2017:3205 RHSA:2017:3205 1127
3 2404 This contains details for RHSA:2017:3208 RHSA:2017:3208 1128
4 2405 This contains details for RHSA:2017:3200 RHSA:2017:3200 1125