Question

我设法安装了spacy，但是当尝试使用nlp时，由于某种奇怪的原因，我遇到了MemoryError。

我写的代码如下：

import spacy
import re
from nltk.corpus import gutenberg

def clean_text(astring):
    #replace newlines with space
    newstring=re.sub("\n"," ",astring)
    #remove title and chapter headings
    newstring=re.sub("\[[^\]]*\]"," ",newstring)
    newstring=re.sub("VOLUME \S+"," ",newstring)
    newstring=re.sub("CHAPTER \S+"," ",newstring)
    newstring=re.sub("\s\s+"," ",newstring)
    return newstring.lstrip().rstrip()

nlp=spacy.load('en')
alice=clean_text(gutenberg.raw('carroll-alice.txt'))
nlp_alice=list(nlp(alice).sents)

我得到的错误如下

The error message

虽然我的代码是这样的，但是它可以工作：

import spacy

nlp=spacy.load('en')
alice=nlp("hello Hello")

如果有人指出我做错了，我将非常感谢

Answer 1

我猜你真的内存不足了。我找不到确切的数字，但我确定卡罗尔的《爱丽丝梦游仙境》有成千上万的句子。这相当于Spacy中成千上万的{{1}}元素。未经修改，Span决定从POS到传递给它的字符串的依赖项的所有内容。而且，nlp()属性返回一个应该利用的迭代器，而不是立即在列表中扩展。

基本上，您正在尝试的计算很可能会遇到内存限制。您的机器支持多少内存？在乔建议您观察机器内存使用情况的评论中，我仅次于此。我的建议：检查您的内存是否真正用完了，或者限制sents的功能，或者考虑使用迭代器功能进行工作：

nlp()

Spacy MemoryError

1 个答案: