比较两列具有熊猫字符串列表的列

时间:2018-07-04 10:31:37

标签: python pandas

我在熊猫中有一个数据帧,该数据帧有两列,每一行是一个字符串列表,如何检查唯一行上的这两列中是否有单词匹配(标志列是所需的输出)

public function index()
{
   .......
   $extras = $product->extras;
   return view('yourView', compact('extras'));
}

我尝试过

A                B            flag

hello,hi,bye     bye, also       1
but, as well     see, pandas     0 

但是我遇到了这个错误

df['A'].str.contains(df['B'])

1 个答案:

答案 0 :(得分:2)

您可以将每个值分别通过split和set转换为单独的单词,并通过&检查交集,然后将值转换为布尔值-空集将转换为False s并最后一次转换到int s-Falses0 s,True s是1 s。

zipped = zip(df['A'], df['B'])
df['flag'] = [int(bool(set(a.split(',')) & set(b.split(',')))) for a, b in zipped]
print (df)
              A            B  flag
0  hello,hi,bye    bye,also     1
1   but,as well  see,pandas     0

类似的解决方案:

df['flag'] = np.array([set(a.split(',')) & set(b.split(',')) for a, b in zipped]).astype(bool).astype(int)
print (df)
              A            B  flag
0  hello,hi,bye    bye, also     1
1   but,as well  see, pandas     0

编辑:,之前可能存在一些空格,因此将mapstr.strip添加在一起,并使用filter删除空字符串:

df = pd.DataFrame({'A': ['hello,hi,bye', 'but,,,as well'], 
                   'B': ['bye ,,, also', 'see,,,pandas']})
print (df)

               A             B
0   hello,hi,bye  bye ,,, also
1  but,,,as well  see,,,pandas

zipped = zip(df['A'], df['B'])

def setify(x):
    return set(map(str.strip, filter(None, x.split(','))))

df['flag'] = [int(bool(setify(a) & setify(b))) for a, b in zipped]
print (df)
               A             B  flag
0   hello,hi,bye  bye ,,, also     1
1  but,,,as well  see,,,pandas     0