熊猫-检查另一个数据框列中的序列值

时间:2020-01-28 21:40:33

标签: python pandas

我正在尝试检查一个数据帧列中的字符串值是否包含在另一数据帧列中。我的数据框是:

d = {'col1': ['Live', 'Live','Live','Deferrred'], 'col2': ['Live', 'Not live,Deferred', 'Deferred,Live','Not live']}
df = pd.DataFrame(data=d)
print(df)
        col1               col2
0       Live               Live
1       Live  Not live,Deferred
2       Live      Deferred,Live
3  Deferrred           Not live 

如果Col1中的值是Col2中分隔的值之一,则新的“检查”列应显示True:

        col1               col2 Check
0       Live               Live     Y
1       Live  Not live,Deferred     N
2       Live      Deferred,Live     Y
3  Deferrred           Not live     N

我尝试过:

conditions = [df['col1'].isin(df['col2'])]
choices = [('Y')]
df['Check'] = np.select(conditions, choices, default = 'N')

然而,它为True返回了Live in Not live,而应该返回False

我也尝试过:

conditions = [df['col2'].contains(df['col1'])]

无论如何返回: AttributeError: 'Series' object has no attribute 'contains'

有没有办法使.isin()区分大小写,或者有没有办法使Live in Not live返回False

2 个答案:

答案 0 :(得分:2)

这是您可以采取的方法:

df['Check'] = (df
              .apply(lambda x: 'Y' if x['col1'] in x['col2'] else 'N', 1))

答案 1 :(得分:0)

使用numpy

import numpy as np

df['Check']=np.bitwise_and( df['col2'].str.split(",").map(set), df['col1'].str.split().map(set) ).ne(set())

输出:

        col1               col2  Check
0       Live               Live   True
1       Live  Not live,Deferred  False
2       Live      Deferred,Live   True
3  Deferrred           Not live  False
相关问题