Question

I'm trying to create a column in a Pandas DataFrame that shows whether a (string) 'Column1' contains the string in 'Column2'. Reproduceable example below:

# Have
df = pd.DataFrame({'col1': ['a', 'aa', 'b', 'bb', 'c', 'cc'],
                    'col2': ['a', 'b',  'c', 'd',  'e', 'c']})
# Want: Series of 'does col1 contain col2?'
want: pd.Series([True, False, False, False, False, True])

# tried
tried = df.col1.str.contains(df.col2) # TypeError

My error is due to str.contains wanting a single string on the right-hand side above, not another pd.Series. But I'm not sure how to get around this...

Answer 1

This is not as simple a problem to solve as you think, because you cannot reasonably vectorize it.

Your first choice should be a list comprehension.

pd.Series([b in a for a, b in zip(df.col1, df.col2)])

0     True
1    False
2    False
3    False
4    False
5     True
dtype: bool

Your second choice would be np.vectorize:

f = np.vectorize(lambda a, b: b in a)
pd.Series(f(df.col1, df.col2))

0     True
1    False
2    False
3    False
4    False
5     True
dtype: bool

Your last choice should be apply, @jpp's covered that.

Answer 2

This is one loopy way using pd.DataFrame.apply and a lambda function.

df = pd.DataFrame({'col1': ['a', 'aa', 'b', 'bb', 'c', 'cc'],
                    'col2': ['a', 'b',  'c', 'd',  'e', 'c']})

df['test'] = df.apply(lambda x: x['col2'] in x['col1'], axis=1)

Result:

  col1 col2   test
0    a    a   True
1   aa    b  False
2    b    c  False
3   bb    d  False
4    c    e  False
5   cc    c   True

Check if Pandas DF Column1 Contains (str) Column2

2 个答案: