任务是匹配段落中的关键字,我所做的是将段落分成单词并将其放入列表中,然后使用另一个列表中的搜索词进行匹配。
数据:
Automatic Product Title Tagging
Aim: To automate the process of product title tagging using manually tagged data.
ROUTE OPTIMIZATION – Spring Clean
Aim: Minimizing the overall travel time using optimization techniques.
CUSTOMER SEGMENTATION:
Aim: Develop an engine which segments and provides the score for
customers based on their behavior and analyze their purchasing pattern.
尝试输入的代码:
s = ['tagged', 'product title', 'tagging', 'analyze']
skills = []
for word in data.split():
print(word)
word.lower()
if word in s:
skills.append(word)
skills1 = list(set(skills))
print(skills1)
['tagged', 'tagging', 'analyze']
当我使用split函数时,每个单词都被拆分了,因此我无法检测到该段落中存在的单词product title
。
感谢任何人都可以提供帮助。
答案 0 :(得分:3)
您要搜索的不是“关键字”而是短语。一种解决方案是使用正则表达式搜索(简单的substring is in text
结构无法正常工作,因为当给定“产品标题”时,它可能捕获byproduct titles
,这不是您想要的)。 / p>
这应该做到:
import re
[ k for k in skills if re.search( r'\b' + k + r'\b', data, flags=re.IGNORECASE ) ]
答案 1 :(得分:2)
遍历列表s
并检查元素是否在字符串中。
演示:
data = """
Automatic Product Title Tagging
Aim: To automate the process of product title tagging using manually tagged data.
ROUTE OPTIMIZATION – Spring Clean
Aim: Minimizing the overall travel time using optimization techniques.
CUSTOMER SEGMENTATION:
Aim: Develop an engine which segments and provides the score for
customers based on their behavior and analyze their purchasing
pattern.
"""
s = ['tagged', 'product title', 'tagging', 'analyze']
data = data.lower()
skills = []
for i in s:
if i.lower() in data:
skills.append(i)
print(skills)
或一行。
skills = [i for i in s if i.lower() in data]
输出:
['tagged', 'product title', 'tagging', 'analyze']
答案 2 :(得分:0)
split()在传递的参数周围分割字符串。 split()的默认参数为空格。由于您要搜索还包含空格的“产品标题”,因此可以执行以下操作之一:
1)直接在段落中搜索短语
2)如果拆分,则可以在i和i + 1索引中搜索匹配项
答案 3 :(得分:0)
“目标:”必须位于“数据”的每一行中 所以我会找到这个词的索引(“目标:”)
p = "Automatic Product Title Tagging Aim: To automate the process of product title tagging using manually tagged data."
index = p.find("Aim:") # 33
print(p[33:])
output:
"Aim: To automate the process of product title tagging using manually tagged data."
w_lenght = len("Aim:") # 4 : for exclude word "Aim:"
print(p[37:])
output:
" To automate the process of product title tagging using manually tagged data."
示例:
s = ['tagged', 'product title', 'tagging', 'analyze']
skills = []
for line in data.split("\n"):
index = line.find("Aim:") + len("Aim:") #4
if index != -1:
for word in line[index:].split():
if word.lower() in s:
skills.append(word)
print(word)