Spacy 2.0.12 / Thinc 6.10.3在Heroku上崩溃了Django

时间:2018-08-04 14:16:11

标签: python django pip nlp spacy

我遇到的与v2.0.12相关的问题thincpip list告诉我:

msgpack (0.5.6)
msgpack-numpy (0.4.3.1)
murmurhash (0.28.0)
regex (2017.4.5)
scikit-learn (0.19.2)
scipy (1.1.0)
spacy (2.0.12)
thinc (6.10.3)

我的代码可以在Mac上正常工作,但无法生产。堆栈跟踪进入spacy,然后进入thinc,然后django实际上崩溃了。当我使用早期版本的spacy时,所有这些都有效-仅在尝试升级到v2.0.12时才出现。

我的requirements.txt文件包含以下几行:

regex==2017.4.5
spacy==2.0.12
scikit-learn==0.19.2
scipy==1.1.0
https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz

在部署过程中,最后一行将en_core_web_sm下拉。我这样做是为了可以在部署期间将那些模型加载到Heroku上。

然后我像这样加载解析器:

import en_core_web_sm
en_core_web_sm.load()

然后堆栈跟踪在thinc:中显示了问题

File "spacy/language.py", line 352, in __call__
  doc = proc(doc)
File "pipeline.pyx", line 426, in spacy.pipeline.Tagger.__call__
File "pipeline.pyx", line 438, in spacy.pipeline.Tagger.predict
File "thinc/neural/_classes/model.py", line 161, in __call__
  return self.predict(x)
File "thinc/api.py", line 55, in predict
  X = layer(X)
File "thinc/neural/_classes/model.py", line 161, in __call__
  return self.predict(x)
File "thinc/api.py", line 293, in predict
  X = layer(layer.ops.flatten(seqs_in, pad=pad))
File "thinc/neural/_classes/model.py", line 161, in __call__
  eturn self.predict(x)
File "thinc/api.py", line 55, in predict
  X = layer(X)
File "thinc/neural/_classes/model.py", line 161, in __call__
  return self.predict(x)
File "thinc/neural/_classes/model.py", line 125, in predict
  y, _ = self.begin_update(X)
File "thinc/api.py", line 374, in uniqued_fwd
  Y_uniq, bp_Y_uniq = layer.begin_update(X_uniq, drop=drop)
File "thinc/api.py", line 61, in begin_update
  X, inc_layer_grad = layer.begin_update(X, drop=drop)
File "thinc/neural/_classes/layernorm.py", line 51, in begin_update
  X, backprop_child = self.child.begin_update(X, drop=0.)
File "thinc/neural/_classes/maxout.py", line 69, in begin_update
  output__boc = self.ops.batch_dot(X__bi, W)
File "gunicorn/workers/base.py", line 192, in handle_abort
  sys.exit(1)

再次-这些都可以在我的笔记本电脑上使用。

我的加载方式有问题吗?还是我的thinc版本过时了?如果是这样,我的requirements.txt文件应该是什么样的?

1 个答案:

答案 0 :(得分:1)

我已经解决了这个问题,但是如果有人需要它,我会留下答案。

问题在于,由于我建立和训练sklearn模型的方式和时间,我的线程花了太长时间来响应。结果,Heroku中止了线程-这就是堆栈跟踪显示abort的原因。

解决方法是更改​​加载ML模型的方式和时间,以使此特定操作不会超时。