我需要搜索df列并返回列表中的所有子字符串。
myList= ['a cat', 'the dog', 'a cow']
example df
'col A'
there was a cat with the dog
the cow was brown
the dog was sick
这会拆分列表中的单词,仅返回单个单词
df['col B'] = df['col A'].apply(lambda x: ';'.join([word for word in x.split() if word in (myList)]))
还尝试在np中添加任何内容...
df['col B'] = df['col A'].apply(lambda x: ';'.join(np.any(word for word in df['col A'] if word in (myList))))
需要返回
'col B'
a cat;the dog
NaN
the dog
答案 0 :(得分:1)
您可以
s = df.col.str.extractall(f'({"|".join(myList)})')
res = s.groupby(s.index.get_level_values(0))[0].agg(';'.join)
df.loc[res.index, 'new'] = res
col new
0 there was a cat with the dog a cat;the dog
1 the cow was brown NaN
2 the dog was sick the dog
答案 1 :(得分:0)
这应该有效,您很亲密:
import numpy as np
df['col B'] = df['col A'].apply(lambda x: ';'.join([m for m in myList if m in x])).replace('',np.nan)
结果:
col A col B
0 there was a cat with the dog a cat;the dog
1 the cow was brown NaN
2 the dog was sick the dog