我对Hugging-Face变压器非常陌生。当我尝试从给定路径加载 xlm-roberta-base 模型时,我面临以下问题:
>> tokenizer = AutoTokenizer.from_pretrained(model_path)
>> Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_auto.py", line 182, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 309, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 458, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_roberta.py", line 98, in __init__
**kwargs,
File "/home/user/anaconda3/lib/python3.7/site-packages/transformers/tokenization_gpt2.py", line 133, in __init__
with open(vocab_file, encoding="utf-8") as vocab_handle:
TypeError: expected str, bytes or os.PathLike object, not NoneType
但是,如果按名称加载它,就没有问题:
>> tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')
我将不胜感激。
答案 0 :(得分:1)
我假设您已经按照documentation中的描述创建了该目录,
tokenizer.save_pretrained('YOURPATH')
当前有一个issue正在调查中,该问题仅影响自动令牌生成器,而不影响诸如(XLMRobertaTokenizer)之类的基础令牌生成器。例如,以下方法应该起作用:
from transformers import XLMRobertaTokenizer
tokenizer = XLMRobertaTokenizer.from_pretrained('YOURPATH')
要使用自动令牌生成器,您还需要保存配置以离线加载它:
from transformers import AutoTokenizer, AutoConfig
tokenizer = AutoTokenizer.from_pretrained('xlm-roberta-base')
config = AutoConfig.from_pretrained('xlm-roberta-base')
tokenizer.save_pretrained('YOURPATH')
config.save_pretrained('YOURPATH')
tokenizer = AutoTokenizer.from_pretrained('YOURPATH')
我建议 为令牌生成器和模型使用不同的路径,或者保留模型的config.json,因为对模型进行的一些修改会会存储在model.save_pretrained()
期间创建的config.json中,并且在模型保存后如上述保存令牌生成器时将被覆盖(即,您将无法使用令牌生成器config.json加载修改后的模型)
答案 1 :(得分:0)
我遇到了相同的错误消息,要进行修复,可以在参数中添加use_fast=True
。
generator = AutoTokenizer.from_pretrained(generator_path, config=config.generator, use_fast=True)