在python中搜索文档中的关键字

时间:2011-06-30 06:22:44

标签: python keyword

我正在尝试编写一个python脚本,以便它可以在文档中搜索关键字,并检索关键字所在的整个句子。从我的研究中我看到acora可以使用,但我仍然发现它不成功。

4 个答案:

答案 0 :(得分:2)

>>> text = """Hello, this is the first sentence. This is the second. 
And this may or may not be the third. Am I right? No? lol..."""

>>> import re
>>> s = re.split(r'[.?!:]+', text)
>>> def search(word, sentences):
       return [i for i in sentences if re.search(r'\b%s\b' % word, i)]

>>> search('is', s)
['Hello, this is the first sentence', ' This is the second']

答案 1 :(得分:0)

这就是你如何在shell中简单地做到这一点。你应该自己编写脚本。

>>> text = '''this is sentence 1. and that is sentence
              2. and sometimes sentences are good.
              when that's sentence 4, there's a good reason. and that's 
              sentence 5.'''
>>> for line in text.split('.'):
...     if 'and' in line:
...         print line
... 
 and that is sentence 2
 and sometimes sentences are good
 and that's sentence 5

在这里,我将text.split('.')分开并进行迭代,然后使用字and进行控制,如果包含,则打印出来。

您还应该考虑这是区分大小写。您应该考虑解决方案中的许多内容,例如以!?结尾的内容也是句子(但有时它们不是)

  

这是一句话(哈?)还是你认为(!)所以?

将被拆分为

  • 这是一句话(ha
  • )或者你认为(
  • )所以

答案 2 :(得分:0)

我对此没有太多经验,但您可能正在寻找nltk

试试this;使用span_tokenize并查找单词索引所在的范围,然后查看该句子。

答案 3 :(得分:0)

使用pyp或egrep命令和python的子进程模块,它可以帮助你。

e.g:

from subprocess import Popen, PIPE

stdout = Popen("grep 'word1' document.txt", shell=True, stdout=PIPE).stdout
#to search 2 different words: stdout = Popen("egrep 'word1|word2' document.txt",       
#shell=True, #stdout=PIPE).stdout
data = stdout.read()
data.split('\n')