如何序列化Keras模型以与Joblib一起使用?

时间:2017-12-07 20:09:02

标签: python serialization keras pickle joblib

我尝试将Keras和Joblib结合起来,以生成多个简单模型并将它们存储在一个数组中,以便我可以在验证阶段投射探测样本。

我使用Joblib的几个简单的二进制神经网络模型实现了Bootstrap聚合(Bagging)方法。但是,当我尝试预测时,我遇到了以下错误:

Traceback (most recent call last):
File "../HFCN_openset_load.py", line 264, in <module>
main()
File "../HFCN_openset_load.py", line 107, in main
pr, roc = fcnhface(args, parallel_pool)
File "../HFCN_openset_load.py", line 194, in fcnhface
pred = models[k][0].predict(feature_vector.reshape(1, feature_vector.shape[0]))
File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 1004, in predict
if not self.built:
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 339, in built
return self._built
AttributeError: 'Sequential' object has no attribute '_built'

在您的下方,我会发现我认为错误可能出现在我的代码中的部分内容:

def getModel(input_shape,nclasses=2):
    make_keras_picklable()
    model = Sequential()
    model.add(Dense(64, activation='relu', input_shape=input_shape))
    model.add(Dropout(0.2))
    model.add(Dense(nclasses, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adadelta', metrics=['accuracy'])#RMSprop()
    return model

def learn_fc_model(X, Y, split):
    boolean_label = [(split[key]+1)/2 for key in Y]
    y_train = np_utils.to_categorical(boolean_label, 2)
    model = getModel(input_shape=X[0].shape)
    model.fit(X, y_train, batch_size=40, nb_epoch=100, verbose=0)
    return (model, split)

#Training using Joblib, models is a list of tuples (ANN models, any variable)
with Parallel(n_jobs=4, verbose=15, backend='multiprocessing') as parallel_pool:
    models = parallel_pool(
        delayed(learn_fc_model) (numpy_x, numpy_y, split) for split in numpy_s
    )

#Testing
for k in range (0, len(models)):
    pred = models[k][0].predict(feature_vector.reshape(1, feature_vector.shape[0]))

指向完整文件的链接是here

1 个答案:

答案 0 :(得分:0)

以下是使用 Joblib 并行估计多个 keras 模型的简单方法

定义基本参数:

  • n_jobs:有多少工作

  • n_estimators:适合多少模型

    n_jobs, n_estimators = 4, 20
    

生成虚拟数据:

n_class = 2
X = np.random.uniform(0,1, (100,10))
y = np.random.randint(0,n_class, 100)

空模型结构定义的实用函数:

def get_model(input_shape):
    m = Sequential([Dense(n_class, input_shape=input_shape,
                          activation='softmax')])
    m.compile(loss='sparse_categorical_crossentropy', optimizer='adam')
    return m

多模型拟合的效用函数(必须返回拟合权重列表):

def fit_models(n_estimators, x, y):
    
    weights = []
    for _ in range(n_estimators):
        m = get_model(input_shape=(10,))
        m.fit(x, y)
        weights.append(m.get_weights())
    
    return weights

用于在作业之间划分估算器的实用函数

from joblib import Parallel, delayed, effective_n_jobs

def _partition_estimators(n_estimators, n_jobs):

    # Compute the number of jobs
    n_jobs = min(effective_n_jobs(n_jobs), n_estimators)

    # Partition estimators between jobs
    n_estimators_per_job = np.full(n_jobs, n_estimators // n_jobs,
                                   dtype=int)
    n_estimators_per_job[:n_estimators % n_jobs] += 1

    return n_jobs, n_estimators_per_job.tolist()

并行运行作业:

n_jobs, n_estimators = _partition_estimators(n_estimators, n_jobs)

res = Parallel(n_jobs=n_jobs, verbose=1)(
    delayed(fit_models)(
        n_estimators = n_estimators[i],
        x = X,
        y = y
    ) 
    for i in range(n_jobs))

all_weights = list(itertools.chain.from_iterable(res)) # get all fitted weights in a list
all_models = [get_model((10,)) for _ in all_weights] # get empty models in a list
# put fitted weights into empty model structures
for w,m in zip(all_weights, all_models):
    m.set_weights(w)

here 带有完整示例的跑步笔记本