熊猫匹配列表中的元素

时间:2019-06-25 10:53:15

标签: python pandas

我需要将pandas列中列出的关键字与列表中的关键字进行匹配,并创建一个包含匹配词的新列。示例:

my_list = ['machine learning', 'artificial intelligence', 'lasso']

数据:

listing                                         keyword_column
I am looking for machine learning expert        machine learning
Machine learning expert that knows lasso        machine learning, lasso
Need a web designer                              
Artificial Intelligence application on...       artificial intelligence

3 个答案:

答案 0 :(得分:2)

使用Series.str.findall获取列表中的所有值,通过Series.str.join加入在一起,并在必要时通过Series.str.lower转换为小写:

这里还使用了带有\b的单词边界来正确匹配my_list中的整个单词。

my_list = ['machine learning', 'artificial intelligence', 'lasso']

import re

pat = '|'.join(r"\b{}\b".format(x) for x in my_list)
df['new'] = df['listing'].str.findall(pat, flags=re.I).str.join(', ').str.lower()

或者:

df['new'] = df['listing'].str.lower().str.findall(pat).str.join(', ')

print (df)
                                    listing           keyword_column  \
0  I am looking for machine learning expert         machine learning   
1  Machine learning expert that knows lasso  machine learning, lasso   
2                      Need a web designer                       NaN   
3    Artificial Intelligence application on  artificial intelligence   

                       new  
0         machine learning  
1  machine learning, lasso  
2                           
3  artificial intelligence  

答案 1 :(得分:1)

您还可以使用str.lower + str.findall + str.join解决问题:

df['keyword_column'] = df['listing'].str.lower().str.findall('|'.join(my_list)).str.join(', ')

现在:

print(df)

是:

                                     listing           keyword_column
0   I am looking for machine learning expert         machine learning
1   Machine learning expert that knows lasso  machine learning, lasso
2                        Need a web designer                         
3  Artificial Intelligence application on...  artificial intelligence

答案 2 :(得分:1)

flashtext也可以用于提取关键字

import pandas as pd
from flashtext import KeywordProcessor

data = ['I am looking for machine learning expert','Machine learning expert that knows lasso ','Need a web designer','Artificial Intelligence application on...' ]

df = pd.DataFrame(data, columns = ['listing'])
my_list = ['machine learning', 'artificial intelligence', 'lasso']

kp = KeywordProcessor()
kp.add_keywords_from_list(my_list)

df['keyword_columns'] = df['listing'].apply(lambda x: kp.extract_keywords(x))

#op
df['keyword_columns']
Out[68]: 
0           [machine learning]
1    [machine learning, lasso]
2                           []
3    [artificial intelligence]