我正在尝试编写一个python脚本,以便它可以在文档中搜索关键字,并检索关键字所在的整个句子。从我的研究中我看到acora可以使用,但我仍然发现它不成功。
答案 0 :(得分:2)
>>> text = """Hello, this is the first sentence. This is the second.
And this may or may not be the third. Am I right? No? lol..."""
>>> import re
>>> s = re.split(r'[.?!:]+', text)
>>> def search(word, sentences):
return [i for i in sentences if re.search(r'\b%s\b' % word, i)]
>>> search('is', s)
['Hello, this is the first sentence', ' This is the second']
答案 1 :(得分:0)
这就是你如何在shell中简单地做到这一点。你应该自己编写脚本。
>>> text = '''this is sentence 1. and that is sentence
2. and sometimes sentences are good.
when that's sentence 4, there's a good reason. and that's
sentence 5.'''
>>> for line in text.split('.'):
... if 'and' in line:
... print line
...
and that is sentence 2
and sometimes sentences are good
and that's sentence 5
在这里,我将text
与.split('.')
分开并进行迭代,然后使用字and
进行控制,如果包含,则打印出来。
您还应该考虑这是区分大小写。您应该考虑解决方案中的许多内容,例如以!
和?
结尾的内容也是句子(但有时它们不是)
这是一句话(哈?)还是你认为(!)所以?
将被拆分为
答案 2 :(得分:0)
答案 3 :(得分:0)
使用pyp或egrep命令和python的子进程模块,它可以帮助你。
e.g:
from subprocess import Popen, PIPE
stdout = Popen("grep 'word1' document.txt", shell=True, stdout=PIPE).stdout
#to search 2 different words: stdout = Popen("egrep 'word1|word2' document.txt",
#shell=True, #stdout=PIPE).stdout
data = stdout.read()
data.split('\n')