我正在尝试检查一个数据帧列中的字符串值是否包含在另一数据帧列中。我的数据框是:
d = {'col1': ['Live', 'Live','Live','Deferrred'], 'col2': ['Live', 'Not live,Deferred', 'Deferred,Live','Not live']}
df = pd.DataFrame(data=d)
print(df)
col1 col2
0 Live Live
1 Live Not live,Deferred
2 Live Deferred,Live
3 Deferrred Not live
如果Col1中的值是Col2中分隔的值之一,则新的“检查”列应显示True:
col1 col2 Check
0 Live Live Y
1 Live Not live,Deferred N
2 Live Deferred,Live Y
3 Deferrred Not live N
我尝试过:
conditions = [df['col1'].isin(df['col2'])]
choices = [('Y')]
df['Check'] = np.select(conditions, choices, default = 'N')
然而,它为True
返回了Live in Not live
,而应该返回False
。
我也尝试过:
conditions = [df['col2'].contains(df['col1'])]
无论如何返回:
AttributeError: 'Series' object has no attribute 'contains'
有没有办法使.isin()
区分大小写,或者有没有办法使Live in Not live
返回False
?
答案 0 :(得分:2)
这是您可以采取的方法:
df['Check'] = (df
.apply(lambda x: 'Y' if x['col1'] in x['col2'] else 'N', 1))
答案 1 :(得分:0)
使用numpy
:
import numpy as np
df['Check']=np.bitwise_and( df['col2'].str.split(",").map(set), df['col1'].str.split().map(set) ).ne(set())
输出:
col1 col2 Check
0 Live Live True
1 Live Not live,Deferred False
2 Live Deferred,Live True
3 Deferrred Not live False