如何将数据列中的单词与值列表进行匹配,并在python中的pandas中应用ignorecase

时间:2017-11-03 13:22:45

标签: python pandas dataframe data-analysis

我有一个df,

Name
Ram is one of the key ram
Kumar is playing cricket
Ravi is playing and ravi is a good player

和一个清单

my_list=["Ram","ravi"]

我想要的数据框是,

desired_df,
Name                                        Match    Count 
Ram is one of the key ram                   Ram      1
Kumar is playing cricket                 
Ravi is playing and ravi is a good player   ravi     1   

我试过

 extracted = df.str.findall('(' + '|'.join(my_list) + ')', 
 flags=re.IGNORECASE).apply(set)
 but I am getting like,
 Match
 Ram,ram
 Ravi,ravi

但我无法达到我想要的输出,请帮助。

2 个答案:

答案 0 :(得分:2)

这是你在找什么?

new_l = [i.lower() for i in my_list]
extracted = df['Name'].str.lower().str.findall('(' + '|'.join(new_l) + ')').apply(set)


df['Match'] = extracted.apply(','.join)
df['count'] = extracted.apply(len)
                                          Name     Match  count
0                      Ram is one of the key ram       ram      1
1                       Kumar is playing cricket                0
2  Ravi Ram is playing and ravi is a good player  ram,ravi      2

答案 1 :(得分:1)

In [187]: pat = '({})'.format('|'.join(my_list))

In [188]: df['Match'] = df['Name'].str.extract(pat, expand=False)

In [190]: df['Count'] = df.Name.str.count(pat)

In [191]: df
Out[191]:
                                                Name Match  Count
0                          Ram is one of the key ram   Ram      1
1                           Kumar is playing cricket   NaN      0
2  Ravi is playing and ravi (ravi ravi) is a good...  ravi      3  # i've intentionally added `(ravi ravi)`