Question

我正在使用negex在我的文本中找到否定词以及否定范围。这是negex.py： https://github.com/chapmanbe/negex/blob/master/negex.python/negex.py 这是我的包装函数，用于为输入文本的每个句子调用negex：

input_text = ' Today the weather is not great for playing baseball.
 it is snowing and the wind is strong.
 I wish it was sunny but it is not what I want.
 Today is Sunday and I have to go to school tomorrow.
 Tommorrow is not going to be snowing though.'

包装函数：

for report in data_samples:
    this_txt, this_sentences = sentences_for_text(report)

    for i in range(len(this_txt)):
        this_string = this_txt[i]
        my_sentences = this_sentences[i]

       for sntc in my_sentences:
           my_ngrams = find_ngrams(sntc)

          for grm in my_ngrams:
             tagger = negTagger(sentence = sntc, phrases = grm, rules = irules, negP=False)

             if 'negated' in tagger.getNegationFlag():
                 print("tagger.getScopes():", tagger.getScopes())
                 output.append([this_string, grm])
                 output.append(tagger.getScopes())

所以每个report可以有多个段，我在报告中得到每个段并将其分解为句子，我为每个sntc提取所有的unigram，bigram，trigram和forgrams，并且我通过每个句子的所有克来找到句子中的否定。这段代码正在运行，但问题是它占用了大量内存，在完成一个报告之前我得到了MemoryError:。我需要为成千上万的报告运行这个，任何想法如何解决这个问题，因为我只关心negated标签？

MemoryError：当我运行negex.py时

0 个答案: