在文档中搜索短语

时间:2018-09-07 07:04:25

标签: python

任务是匹配段落中的关键字,我所做的是将段落分成单词并将其放入列表中,然后使用另一个列表中的搜索词进行匹配。

数据:

Automatic Product Title Tagging
Aim: To automate the process of product title tagging using manually tagged data. 

ROUTE OPTIMIZATION – Spring Clean
Aim:  Minimizing the overall travel time using optimization techniques. 

CUSTOMER SEGMENTATION:
Aim:  Develop an engine which segments and provides the score for
      customers based on their behavior and analyze their purchasing pattern. 

尝试输入的代码:

s = ['tagged', 'product title',  'tagging', 'analyze']

skills = []
for word in data.split():

    print(word)    
    word.lower()
    if word in s:

        skills.append(word)
skills1 = list(set(skills))

print(skills1)

['tagged', 'tagging', 'analyze'] 

当我使用split函数时,每个单词都被拆分了,因此我无法检测到该段落中存在的单词product title

感谢任何人都可以提供帮助。

4 个答案:

答案 0 :(得分:3)

您要搜索的不是“关键字”而是短语。一种解决方案是使用正则表达式搜索(简单的substring is in text结构无法正常工作,因为当给定“产品标题”时,它可能捕获byproduct titles,这不是您想要的)。 / p>

这应该做到:

import re
[ k for k in skills if re.search( r'\b' + k + r'\b', data, flags=re.IGNORECASE ) ]

答案 1 :(得分:2)

遍历列表s并检查元素是否在字符串中。

演示:

data = """
 Automatic Product Title Tagging  
 Aim: To automate the process of product title tagging using manually tagged data.
 ROUTE OPTIMIZATION – Spring Clean
 Aim:  Minimizing the overall travel time using optimization techniques.
 CUSTOMER SEGMENTATION:
 Aim:  Develop an engine which segments and provides the score for  
       customers based on their behavior and analyze their purchasing
       pattern. 
"""
s = ['tagged', 'product title',  'tagging', 'analyze']
data = data.lower()

skills = []
for i in s:
    if i.lower() in data:
        skills.append(i)
print(skills)

或一行。

skills = [i for i in s if i.lower() in data]

输出:

['tagged', 'product title', 'tagging', 'analyze']

答案 2 :(得分:0)

split()在传递的参数周围分割字符串。 split()的默认参数为空格。由于您要搜索还包含空格的“产品标题”,因此可以执行以下操作之一:

1)直接在段落中搜索短语

2)如果拆分,则可以在i和i + 1索引中搜索匹配项

答案 3 :(得分:0)

“目标:”必须位于“数据”的每一行中 所以我会找到这个词的索引(“目标:”)

p = "Automatic Product Title Tagging  Aim: To automate the process of product title tagging using manually tagged data."
index = p.find("Aim:") # 33
print(p[33:])
output:
"Aim: To automate the process of product title tagging using manually tagged data."
w_lenght = len("Aim:") # 4 : for exclude word "Aim:"
print(p[37:])
output:
" To automate the process of product title tagging using manually tagged data."

示例:

s = ['tagged', 'product title',  'tagging', 'analyze']
skills = []
for line in data.split("\n"):
    index = line.find("Aim:") + len("Aim:") #4
    if index != -1:
    for word in line[index:].split():
        if word.lower() in s:
            skills.append(word)
            print(word)