我有以下代码(仅显示相关部分):
def load_model(model_file):
return Doc2Vec.load(model_file)
# infer
def infer_docs(input_string, model_file, inferred_docs=5):
model = load_model(model_file)
processed_str = simple_preprocess(input_string, min_len=2, max_len=35)
inferred_vector = model.infer_vector(processed_str)
return model.docvecs.most_similar([inferred_vector], topn=inferred_docs)
代码在aws上以lambda身份运行。当我的模型很小时(我认为这就是原因)它工作正常但是当我有一个体面的大小模型(〜200mb)时,我得到以下错误
[INFO] 2018-01-21T20:44:59.613Z f2689816-feeb-11e7-b397-b7ff2947dcec testing keys in event dict
[INFO] 2018-01-21T20:44:59.614Z f2689816-feeb-11e7-b397-b7ff2947dcec loading model from s3://data-d2v/trained_models/model_law
[INFO] 2018-01-21T20:44:59.614Z f2689816-feeb-11e7-b397-b7ff2947dcec loading Doc2Vec object from s3://data-d2v/trained_models/model_law
[INFO] 2018-01-21T20:44:59.650Z f2689816-feeb-11e7-b397-b7ff2947dcec Found credentials in environment variables.
[INFO] 2018-01-21T20:44:59.707Z f2689816-feeb-11e7-b397-b7ff2947dcec Starting new HTTPS connection (1): s3.eu-west-1.amazonaws.com
[INFO] 2018-01-21T20:44:59.801Z f2689816-feeb-11e7-b397-b7ff2947dcec Starting new HTTPS connection (2): s3.eu-west-1.amazonaws.com
[INFO] 2018-01-21T20:45:35.830Z f2689816-feeb-11e7-b397-b7ff2947dcec loading wv recursively from s3://data-d2v/trained_models/model_law.wv.* with mmap=None
[INFO] 2018-01-21T20:45:35.830Z f2689816-feeb-11e7-b397-b7ff2947dcec loading syn0 from s3://data-d2v/trained_models/model_law.wv.syn0.npy with mmap=None
[Errno 2] No such file or directory: 's3://data-d2v/trained_models/model_law.wv.syn0.npy': FileNotFoundError
Traceback (most recent call last):
File "/var/task/handler.py", line 20, in infer_handler
event['input_text'], event['model_file'], inferred_docs=10)
File "/var/task/infer_doc.py", line 26, in infer_docs
model = load_model(model_file)
File "/var/task/infer_doc.py", line 21, in load_model
return Doc2Vec.load(model_file)
File "/var/task/gensim/models/word2vec.py", line 1569, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
File "/var/task/gensim/utils.py", line 282, in load
obj._load_specials(fname, mmap, compress, subname)
File "/var/task/gensim/models/word2vec.py", line 1593, in _load_specials
super(Word2Vec, self)._load_specials(*args, **kwargs)
File "/var/task/gensim/utils.py", line 301, in _load_specials
getattr(self, attrib)._load_specials(cfname, mmap, compress, subname)
File "/var/task/gensim/utils.py", line 312, in _load_specials
val = np.load(subname(fname, attrib), mmap_mode=mmap)
File "/var/task/numpy/lib/npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 's3://data-d2v/trained_models/model_law.wv.syn0.npy'
首先文件s3://data-d2v/trained_models/model_law.wv.syn0.npy
存在,其次对我来说似乎加载了主模型文件s3://data-d2v/trained_models/model_law
。
要验证我添加的文件的访问和存在:
import smart_open
with smart_open.smart_open('s3://data-d2v/trained_models/model_law.wv.syn0.npy') as prut:
for line in prut:
print(line)
可以很好地打印。
你能帮忙吗?答案 0 :(得分:1)
当模型拆分为多个文件时,目前无法使用s3存储桶加载模型。我已经在github上发布了一个功能请求