I'm trying to create a column in a Pandas DataFrame that shows whether a (string) 'Column1' contains the string in 'Column2'. Reproduceable example below:
# Have
df = pd.DataFrame({'col1': ['a', 'aa', 'b', 'bb', 'c', 'cc'],
'col2': ['a', 'b', 'c', 'd', 'e', 'c']})
# Want: Series of 'does col1 contain col2?'
want: pd.Series([True, False, False, False, False, True])
# tried
tried = df.col1.str.contains(df.col2) # TypeError
My error is due to str.contains
wanting a single string on the right-hand side above, not another pd.Series
. But I'm not sure how to get around this...
答案 0 :(得分:2)
This is not as simple a problem to solve as you think, because you cannot reasonably vectorize it.
Your first choice should be a list comprehension.
pd.Series([b in a for a, b in zip(df.col1, df.col2)])
0 True
1 False
2 False
3 False
4 False
5 True
dtype: bool
Your second choice would be np.vectorize
:
f = np.vectorize(lambda a, b: b in a)
pd.Series(f(df.col1, df.col2))
0 True
1 False
2 False
3 False
4 False
5 True
dtype: bool
Your last choice should be apply
, @jpp's covered that.
答案 1 :(得分:0)
This is one loopy way using pd.DataFrame.apply
and a lambda
function.
df = pd.DataFrame({'col1': ['a', 'aa', 'b', 'bb', 'c', 'cc'],
'col2': ['a', 'b', 'c', 'd', 'e', 'c']})
df['test'] = df.apply(lambda x: x['col2'] in x['col1'], axis=1)
Result:
col1 col2 test
0 a a True
1 aa b False
2 b c False
3 bb d False
4 c e False
5 cc c True