Question

我正在尝试编写一个python脚本，以便它可以在文档中搜索关键字，并检索关键字所在的整个句子。从我的研究中我看到acora可以使用，但我仍然发现它不成功。

Answer 1

>>> text = """Hello, this is the first sentence. This is the second. 
And this may or may not be the third. Am I right? No? lol..."""

>>> import re
>>> s = re.split(r'[.?!:]+', text)
>>> def search(word, sentences):
       return [i for i in sentences if re.search(r'\b%s\b' % word, i)]

>>> search('is', s)
['Hello, this is the first sentence', ' This is the second']

Answer 2

这就是你如何在shell中简单地做到这一点。你应该自己编写脚本。

>>> text = '''this is sentence 1. and that is sentence
              2. and sometimes sentences are good.
              when that's sentence 4, there's a good reason. and that's 
              sentence 5.'''
>>> for line in text.split('.'):
...     if 'and' in line:
...         print line
... 
 and that is sentence 2
 and sometimes sentences are good
 and that's sentence 5

在这里，我将text与.split('.')分开并进行迭代，然后使用字and进行控制，如果包含，则打印出来。

您还应该考虑这是区分大小写。您应该考虑解决方案中的许多内容，例如以!和?结尾的内容也是句子（但有时它们不是）

这是一句话（哈？）还是你认为（！）所以？

将被拆分为

这是一句话（ha
）或者你认为（
）所以

Answer 3

我对此没有太多经验，但您可能正在寻找nltk。

试试this;使用span_tokenize并查找单词索引所在的范围，然后查看该句子。

Answer 4

使用pyp或egrep命令和python的子进程模块，它可以帮助你。

e.g：

from subprocess import Popen, PIPE

stdout = Popen("grep 'word1' document.txt", shell=True, stdout=PIPE).stdout
#to search 2 different words: stdout = Popen("egrep 'word1|word2' document.txt",       
#shell=True, #stdout=PIPE).stdout
data = stdout.read()
data.split('\n')

在python中搜索文档中的关键字

4 个答案: