选择其中包含所选单词的句子

时间:2013-09-25 10:50:34

标签: python nltk

假设我有一个段落:

text = '''Darwin published his theory of evolution with compelling evidence in his 1859 book On the Origin of Species, overcoming scientific rejection of earlier concepts of transmutation of species.[4][5] By the 1870s the scientific community and much of the general public had accepted evolution as a fact. However, many favoured competing explanations and it was not until the emergence of the modern evolutionary synthesis from the 1930s to the 1950s that a broad consensus developed in which natural selection was the basic mechanism of evolution.[6][7] In modified form, Darwin's scientific discovery is the unifying theory of the life sciences, explaining the diversity of life.[8][9]'''

如果我输入一个单词(赞成),那么我该如何删除该单词所在的整个句子。 我之前使用的方法很繁琐;我会使用sent_tokenize打破para(超过13000个单词),因为我必须检查超过1000个单词,我会运行一个循环来检查每个单词中的每个单词。这需要花费很多时间,因为有超过400个句子。

相反,我想检查段落中的那1000个单词,当找到单词时,它会选择所有单词,直到完全停止,然后选择所有单词,直到完全停止。

3 个答案:

答案 0 :(得分:0)

我不确定你是否理解你的问题,但你可以这样做:

text = 'whatever....'
sentences = text.split('.')
good_sentences = [e for e in sentences if 'my_word' not in e]

这就是你要找的东西吗?

答案 1 :(得分:0)

这将删除包含某个单词的所有句子(由.限定的内容)。

def remove_sentence(input, word):
    return ".".join((sentence for sentence in input.split(".")
                    if word not in sentence))

>>>> remove_sentence(text, "published")
"[4][5] By the 1870s the scientific community and much of the general public had accepted evolution as a fact. However, many favoured competing explanations and it was not until the emergence of the modern evolutionary synthesis from the 1930s to the 1950s that a broad consensus developed in which natural selection was the basic mechanism of evolution.[6][7] In modified form, Darwin's scientific discovery is the unifying theory of the life sciences, explaining the diversity of life.[8][9]"
>>>
>>> remove_sentence(text, "favoured")
"Darwin published his theory of evolution with compelling evidence in his 1859 book On the Origin of Species, overcoming scientific rejection of earlier concepts of transmutation of species.[4][5] By the 1870s the scientific community and much of the general public had accepted evolution as a fact.[6][7] In modified form, Darwin's scientific discovery is the unifying theory of the life sciences, explaining the diversity of life.[8][9]"

答案 2 :(得分:0)

您可能有兴趣尝试类似以下程序的内容:

import re

SENTENCES = ('This is a sentence.',
             'Hello, world!',
             'Where do you want to go today?',
             'The apple does not fall far from the tree.',
             'Sally sells sea shells by the sea shore.',
             'The Jungle Book has several stories in it.',
             'Have you ever been up to the moon?',
             'Thank you for helping with my problem!')

BAD_WORDS = frozenset(map(str.lower, ('to', 'sea')))

def main():
    for index, sentence in enumerate(SENTENCES):
        if frozenset(words(sentence.lower())) & BAD_WORDS:
            print('Delete:', repr(sentence))

words = lambda sentence: (m.group() for m in re.finditer('\w+', sentence))

if __name__ == '__main__':
    main()

原因

  1. 您首先要使用要过滤的句子和要查找的字词。
  2. 您将每个句子的单词组与您要查找的单词组进行比较。
  3. 如果有交叉路口,您正在查看的句子就是您要移除的句子。
  4. 输出

    Delete: 'Where do you want to go today?'
    Delete: 'Sally sells sea shells by the sea shore.'
    Delete: 'Have you ever been up to the moon?'