如何检索所选单词周围的整个句子?

时间:2018-09-28 08:21:48

标签: python python-2.7 python-requests full-text-search

我想找到一个选定的单词,并从其前的第一个句点(。)到其后的第一个句点(。)的所有内容。

示例:

在文件调用“ text.php”内部

'The price of blueberries has gone way up. In the year 2038 blueberries have 
 almost tripled in price from what they were ten years ago. Economists have 
 said that berries may going up 300% what they are worth today.'

代码示例:((我知道,如果我使用这样的代码,我可以在单词['that']之前找到+5,然后在单词['that']之后找到+5,但是我想查找之前和之后的所有内容)一个词。)

import re

text = 'The price of blueberries has gone way up, that might cause trouble for farmers.
In the year 2038 blueberries have almost tripled in price from what they were ten years 
ago. Economists have said that berries may going up 300% what they are worth today.'

find = 
re.search(r"(?:[a-zA-Z'-]+[^a-zA-Z'-]+){0,5}that(?:[^a-zA-Z'-]+[a-zA-Z'-]+){0,5}", text)
done = find.group()
print(done)

返回:

'blueberries has gone way up, that might cause trouble for farmers'

我希望它返回其中带有['that']的每个句子。

示例回报(我想要得到什么):

'The price of blueberries has gone way up, that might cause trouble for farmers',
'Economists have said that berries may going up 300% what they are worth today'

2 个答案:

答案 0 :(得分:1)

我会这样:

text = 'The price of blueberries has gone way up, that might cause trouble for farmers. In the year 2038 blueberries have almost tripled in price from what they were ten years ago. Economists have said that berries may going up 300% what they are worth today.'
for sentence in text.split('.'):
    if 'that' in sentence:
        print(sentence.strip())

.strip()只是为了修剪多余的空格,因为我正在.上划分。

如果您确实想使用re模块,那么我会使用类似的东西:

text = 'The price of blueberries has gone way up, that might cause trouble for farmers. In the year 2038 blueberries have almost tripled in price from what they were ten years ago. Economists have said that berries may going up 300% what they are worth today.'
results = re.findall(r"[^.]+that[^.]+", text)
results = map(lambda x: x.strip(), results)
print(results)

要获得相同的结果。


注意事项:

  • 如果句子中有thatcher之类的词,该句子也将被打印。在第一种解决方案中,您可以使用if 'that' in sentence.split():来将字符串拆分为单词,在第二种解决方案中,您可以使用re.findall(r"[^.]+\bthat\b[^.]+", text)(请注意\b标记;这些标记代表单词边界)。

  • 该脚本依靠句点(.)来限制句子。如果句子本身包含使用句点的单词,则结果可能不是预期的结果(例如,对于句子Dr. Tom is sick yet again today, so I'm substituting for him.,脚本会发现Dr是一个句子,而Tom is sick yet again today, so I'm substituting for him.是另一个句子句子)


编辑:要在评论中回答您的问题,我将进行以下更改:

解决方案1:

text = 'The price of blueberries has gone way up, that might cause trouble for farmers. In the year 2038 blueberries have almost tripled in price from what they were ten years ago. Economists have said that berries may going up 300% what they are worth today.'
sentences = text.split('.')
for i, sentence in enumerate(sentences):
    if 'almost' in sentence:
        before = '' if i == 0 else sentences[i-1].strip()
        middle = sentence.strip()
        after = '' if i == len(sentences)-1 else sentences[i+1].strip()
        print(". ".join([before, middle, after]))

解决方案2:

text = 'The price of blueberries has gone way up, that might cause trouble for farmers. In the year 2038 blueberries have almost tripled in price from what they were ten years ago. Economists have said that berries may going up 300% what they are worth today.'
results = re.findall(r"(?:[^.]+\. )?[^.]+almost[^.]+(?:[^.]+\. )?", text)
results = map(lambda x: x.strip(), results)
print(results)

请注意,这些可能会产生重叠的结果。例如。如果文本为a. b. b. c.,并且您尝试查找包含b的句子,则会得到a. b. bb. b. c

答案 1 :(得分:0)

此功能可以完成此任务:

old_text = 'test 1: test friendly, test 2: not friendly, test 3: test friendly, test 4: not friendly, test 5: not friendly'

replace_dict={'test 1':'tested 1','not':'very'}

功能:

def replace_me(text,replace_dict):
     for key in replace_dict.keys():
          text=text.replace(str(key),str(replace_dict[key]))
     return text

结果:

 print(replace_me(old_text,replace_dict))
 Out: 'tested 1: test friendly, test 2: very friendly, test 3: test friendly, test 4: very friendly, test 5: very friendly'