我有一个df,
Name
Ram is one of the key ram
Kumar is playing cricket
Ravi is playing and ravi is a good player
和一个清单
my_list=["Ram","ravi"]
我想要的数据框是,
desired_df,
Name Match Count
Ram is one of the key ram Ram 1
Kumar is playing cricket
Ravi is playing and ravi is a good player ravi 1
我试过
extracted = df.str.findall('(' + '|'.join(my_list) + ')',
flags=re.IGNORECASE).apply(set)
but I am getting like,
Match
Ram,ram
Ravi,ravi
但我无法达到我想要的输出,请帮助。
答案 0 :(得分:2)
这是你在找什么?
new_l = [i.lower() for i in my_list]
extracted = df['Name'].str.lower().str.findall('(' + '|'.join(new_l) + ')').apply(set)
df['Match'] = extracted.apply(','.join)
df['count'] = extracted.apply(len)
Name Match count 0 Ram is one of the key ram ram 1 1 Kumar is playing cricket 0 2 Ravi Ram is playing and ravi is a good player ram,ravi 2
答案 1 :(得分:1)
In [187]: pat = '({})'.format('|'.join(my_list))
In [188]: df['Match'] = df['Name'].str.extract(pat, expand=False)
In [190]: df['Count'] = df.Name.str.count(pat)
In [191]: df
Out[191]:
Name Match Count
0 Ram is one of the key ram Ram 1
1 Kumar is playing cricket NaN 0
2 Ravi is playing and ravi (ravi ravi) is a good... ravi 3 # i've intentionally added `(ravi ravi)`