我正在使用negex在我的文本中找到否定词以及否定范围。这是negex.py: https://github.com/chapmanbe/negex/blob/master/negex.python/negex.py 这是我的包装函数,用于为输入文本的每个句子调用negex:
input_text = ' Today the weather is not great for playing baseball.
it is snowing and the wind is strong.
I wish it was sunny but it is not what I want.
Today is Sunday and I have to go to school tomorrow.
Tommorrow is not going to be snowing though.'
包装函数:
for report in data_samples:
this_txt, this_sentences = sentences_for_text(report)
for i in range(len(this_txt)):
this_string = this_txt[i]
my_sentences = this_sentences[i]
for sntc in my_sentences:
my_ngrams = find_ngrams(sntc)
for grm in my_ngrams:
tagger = negTagger(sentence = sntc, phrases = grm, rules = irules, negP=False)
if 'negated' in tagger.getNegationFlag():
print("tagger.getScopes():", tagger.getScopes())
output.append([this_string, grm])
output.append(tagger.getScopes())
所以每个report
可以有多个段,我在报告中得到每个段并将其分解为句子,我为每个sntc
提取所有的unigram,bigram,trigram和forgrams,并且我通过每个句子的所有克来找到句子中的否定。这段代码正在运行,但问题是它占用了大量内存,在完成一个报告之前我得到了MemoryError:
。我需要为成千上万的报告运行这个,任何想法如何解决这个问题,因为我只关心negated
标签?