检查数据框中的列中的字符串是否存在于其他数据框中的列中

时间:2019-11-27 14:30:31

标签: python pandas

我有两个数据框。

**DF1**

和下面显示的第二个

**DF2**

我想将它们组合成一个这样的数据框

enter image description here

1 个答案:

答案 0 :(得分:0)

您可以这样做:

df = pd.DataFrame([[1, 'select from [mary_flowers]'], [2, 'select from [esther_pots]'], [3, 'select from [somthing]']], columns=['item_id', 'Column_x'])
df['view_name'] = df['Column_x'].str.extract(r'\[(\w*)\]', expand=True)[0]
df.loc[~df['view_name'].isin(list(df2['view_name'])), 'view_name'] = np.nan
df

输出:

   item_id    Column_x                    view_name
0   1        select from [mary_flowers]   mary_flowers 
1   2        select from [esther_pots]    esther_pots     
2   3        select from [somthing]       NaN 

说明:这将从[]中提取table_name,然后检查它是否在您的第二df中,如果不是,则将其更改为np.nan

编辑: 如果“ Column_x”中可以有多个表名,请使用:

df = pd.DataFrame([[1, 'select from [mary_flowers] join [tom_trucks]'], [2, 'select from [esther_pots]'], [3, 'select from [somthing]']], columns=['item_id', 'Column_x'])
names = ['mary_flowers', 'esther_pots', 'tom_trucks']
df['view_name'] = df['Column_x'].str.findall(r'\[(\w*)\]')
df['view_name'] = df['view_name'].map(lambda views: [v for v in views if v in list(df2['view_name'])])
df

输出:

  item_id   Column_x                                     view_name
0       1   select from [mary_flowers] join [tom_trucks] [mary_flowers, tom_trucks]
1       2   select from [esther_pots]                    [esther_pots]
2       3   select from [somthing]                       []