我需要将pandas列中列出的关键字与列表中的关键字进行匹配,并创建一个包含匹配词的新列。示例:
my_list = ['machine learning', 'artificial intelligence', 'lasso']
数据:
listing keyword_column
I am looking for machine learning expert machine learning
Machine learning expert that knows lasso machine learning, lasso
Need a web designer
Artificial Intelligence application on... artificial intelligence
答案 0 :(得分:2)
使用Series.str.findall
获取列表中的所有值,通过Series.str.join
加入在一起,并在必要时通过Series.str.lower
转换为小写:
这里还使用了带有\b
的单词边界来正确匹配my_list
中的整个单词。
my_list = ['machine learning', 'artificial intelligence', 'lasso']
import re
pat = '|'.join(r"\b{}\b".format(x) for x in my_list)
df['new'] = df['listing'].str.findall(pat, flags=re.I).str.join(', ').str.lower()
或者:
df['new'] = df['listing'].str.lower().str.findall(pat).str.join(', ')
print (df)
listing keyword_column \
0 I am looking for machine learning expert machine learning
1 Machine learning expert that knows lasso machine learning, lasso
2 Need a web designer NaN
3 Artificial Intelligence application on artificial intelligence
new
0 machine learning
1 machine learning, lasso
2
3 artificial intelligence
答案 1 :(得分:1)
您还可以使用str.lower
+ str.findall
+ str.join
解决问题:
df['keyword_column'] = df['listing'].str.lower().str.findall('|'.join(my_list)).str.join(', ')
现在:
print(df)
是:
listing keyword_column
0 I am looking for machine learning expert machine learning
1 Machine learning expert that knows lasso machine learning, lasso
2 Need a web designer
3 Artificial Intelligence application on... artificial intelligence
答案 2 :(得分:1)
flashtext也可以用于提取关键字
import pandas as pd
from flashtext import KeywordProcessor
data = ['I am looking for machine learning expert','Machine learning expert that knows lasso ','Need a web designer','Artificial Intelligence application on...' ]
df = pd.DataFrame(data, columns = ['listing'])
my_list = ['machine learning', 'artificial intelligence', 'lasso']
kp = KeywordProcessor()
kp.add_keywords_from_list(my_list)
df['keyword_columns'] = df['listing'].apply(lambda x: kp.extract_keywords(x))
#op
df['keyword_columns']
Out[68]:
0 [machine learning]
1 [machine learning, lasso]
2 []
3 [artificial intelligence]