我想做的是在pickle对象中加载用于摘要生成的机器学习模型,以便当我将代码部署到我的Web应用程序时,它不会一遍又一遍地进行手动加载。这需要花费大量时间,并且我无法承受让用户在模型加载时等待10-15分钟然后生成摘要的情况。
import cPickle as pickle
from skip_thoughts import configuration
from skip_thoughts import encoder_manager
import en_coref_md
def load_models():
VOCAB_FILE = "skip_thoughts_uni/vocab.txt"
EMBEDDING_MATRIX_FILE = "skip_thoughts_uni/embeddings.npy"
CHECKPOINT_PATH = "skip_thoughts_uni/model.ckpt-501424"
encoder = encoder_manager.EncoderManager()
print "loading skip model"
encoder.load_model(configuration.model_config(),
vocabulary_file=VOCAB_FILE,
embedding_matrix_file=EMBEDDING_MATRIX_FILE,
checkpoint_path=CHECKPOINT_PATH)
print "loaded"
return encoder
encoder= load_models()
print "Starting cPickle dumping"
pickle.dump(encoder, open('/path_to_loaded_model/loaded_model.pkl', "wb"))
print "pickle.dump executed"
print "starting cpickle loading"
loaded_model = pickle.load(open('loaded_model.pkl', 'r'))
print "pickle load done"
cPickle最初是泡菜,但没有一个人在足够的时间内做到了。我第一次尝试执行此操作,正在创建的pickle文件为11.2GB,我认为这太大了。整个过程花了一个多小时,使我的电脑无法使用。而且代码没有执行完,我强迫重启我的电脑,因为它花费了太长时间。
如果有人能帮助我,我将非常感谢。
答案 0 :(得分:0)
我建议检查存储到hdf5中是否可以改善性能:
写入hdf5:
with h5py.File('model.hdf5', 'w') as f:
for var in tf.trainable_variables():
key = var.name.replace('/', ' ')
value = session.run(var)
f.create_dataset(key, data=value)
从hdf5中读取:
with h5py.File('model.hdf5', 'r') as f:
for (name, val) in f.items()
name = name.replace(' ', '/')
val = np.array(val)
session.run(param_setters[name][0], { param_setters[name][1]: val })
来源:
https://www.tensorflow.org/tutorials/keras/save_and_restore_models
https://geekyisawesome.blogspot.com/2018/06/savingloading-tensorflow-model-using.html