Question

任务是匹配段落中的关键字，我所做的是将段落分成单词并将其放入列表中，然后使用另一个列表中的搜索词进行匹配。

数据：

Automatic Product Title Tagging
Aim: To automate the process of product title tagging using manually tagged data. 

ROUTE OPTIMIZATION – Spring Clean
Aim:  Minimizing the overall travel time using optimization techniques. 

CUSTOMER SEGMENTATION:
Aim:  Develop an engine which segments and provides the score for
      customers based on their behavior and analyze their purchasing pattern.

尝试输入的代码：

s = ['tagged', 'product title',  'tagging', 'analyze']

skills = []
for word in data.split():

    print(word)    
    word.lower()
    if word in s:

        skills.append(word)
skills1 = list(set(skills))

print(skills1)

['tagged', 'tagging', 'analyze']

当我使用split函数时，每个单词都被拆分了，因此我无法检测到该段落中存在的单词product title。

感谢任何人都可以提供帮助。

Answer 1

您要搜索的不是“关键字”而是短语。一种解决方案是使用正则表达式搜索（简单的substring is in text结构无法正常工作，因为当给定“产品标题”时，它可能捕获byproduct titles，这不是您想要的）。 / p>

这应该做到：

import re
[ k for k in skills if re.search( r'\b' + k + r'\b', data, flags=re.IGNORECASE ) ]

Answer 2

遍历列表s并检查元素是否在字符串中。

演示：

data = """
 Automatic Product Title Tagging  
 Aim: To automate the process of product title tagging using manually tagged data.
 ROUTE OPTIMIZATION – Spring Clean
 Aim:  Minimizing the overall travel time using optimization techniques.
 CUSTOMER SEGMENTATION:
 Aim:  Develop an engine which segments and provides the score for  
       customers based on their behavior and analyze their purchasing
       pattern. 
"""
s = ['tagged', 'product title',  'tagging', 'analyze']
data = data.lower()

skills = []
for i in s:
    if i.lower() in data:
        skills.append(i)
print(skills)

或一行。

skills = [i for i in s if i.lower() in data]

输出：

['tagged', 'product title', 'tagging', 'analyze']

Answer 3

split（）在传递的参数周围分割字符串。 split（）的默认参数为空格。由于您要搜索还包含空格的“产品标题”，因此可以执行以下操作之一：

1）直接在段落中搜索短语

2）如果拆分，则可以在i和i + 1索引中搜索匹配项

Answer 4

“目标：”必须位于“数据”的每一行中所以我会找到这个词的索引（“目标：”）

p = "Automatic Product Title Tagging  Aim: To automate the process of product title tagging using manually tagged data."
index = p.find("Aim:") # 33
print(p[33:])
output:
"Aim: To automate the process of product title tagging using manually tagged data."
w_lenght = len("Aim:") # 4 : for exclude word "Aim:"
print(p[37:])
output:
" To automate the process of product title tagging using manually tagged data."

示例：

s = ['tagged', 'product title',  'tagging', 'analyze']
skills = []
for line in data.split("\n"):
    index = line.find("Aim:") + len("Aim:") #4
    if index != -1:
    for word in line[index:].split():
        if word.lower() in s:
            skills.append(word)
            print(word)

在文档中搜索短语

4 个答案: