在运行零镜头模型时指出缺少标记器的错误

时间:2020-06-05 02:11:20

标签: deep-learning nlp

此示例取自拥抱脸部,但会产生模型加载错误。我以为我下载了此模型,但是在任何地方都知道如何避免出现此错误?谢谢,威尔

from ktrain import text 
zsl = text.ZeroShotClassifier()
topic_strings=['politics', 'elections', 'sports', 'films', 'television']
doc = 'I am extremely dissatisfied with the President and will definitely vote in 2020.'
zsl.predict(doc, topic_strings=topic_strings, include_labels=True)

导致:

OSError:在标记生成器的模型名称列表(bart-large,bart-large-mnli,bart-large-cnn,bart-large-xsum)中找不到模型名称“ facebook / bart-large-mnli”。我们假设'facebook / bart-large-mnli'是路径,模型标识符或指向包含名为['vocab.json','merges.txt']的词汇文件的目录的url,但在以下位置找不到此类词汇文件此路径或网址。

2 个答案:

答案 0 :(得分:1)

derp我没有默认的模型,所以我更改了第2行以指向我可以正确加载的模型:

zsl = text.ZeroShotClassifier(model_name='bart-large-mnli')

答案 1 :(得分:1)

您可能使用的transformers版本早于2.11。从2.11 transformers开始,必须按照transformers 2.11的CHANGELOG中指示的完整模型ID指定BART(和其他一些模型):

URLs to model weights are not hardcoded anymore (@julien-c)
Archive maps were dictionaries linking pre-trained models to their S3 URLs. Since the arrival of the model hub, these have become obsolete.

⚠️ This PR is breaking for the following models: BART, Flaubert, bert-japanese, bert-base-finnish, bert-base-dutch. ⚠️
Those models now have to be instantiated with their full model id:

"cl-tohoku/bert-base-japanese"
"cl-tohoku/bert-base-japanese-whole-word-masking"
"cl-tohoku/bert-base-japanese-char"
"cl-tohoku/bert-base-japanese-char-whole-word-masking"
"TurkuNLP/bert-base-finnish-cased-v1"
"TurkuNLP/bert-base-finnish-uncased-v1"
"wietsedv/bert-base-dutch-cased"
"flaubert/flaubert_small_cased"
"flaubert/flaubert_base_uncased"
"flaubert/flaubert_base_cased"
"flaubert/flaubert_large_cased"

all variants of "facebook/bart"