使用fit_generator批量训练模型

时间:2020-03-05 09:30:10

标签: tensorflow machine-learning keras

我的模型有100 000个图像训练样本,如何修改下面的代码以批量训练它?使用model.fit_generator我必须在生成器函数中指定它:

def data_generator(descriptions, features, n_step, max_sequence):
    # loop until we finish training
    while 1:
        # loop over photo identifiers in the dataset
        for i in range(0, len(descriptions), n_step):
            Ximages, XSeq, y = list(), list(),list()
            for j in range(i, min(len(descriptions), i+n_step)):
                image = features[j]
                # retrieve text input
                desc = descriptions[j]
                # generate input-output pairs
                in_img, in_seq, out_word = preprocess_data([desc], [image], max_sequence)
                for k in range(len(in_img)):
                    Ximages.append(in_img[k])
                    XSeq.append(in_seq[k])
                    y.append(out_word[k])
            # yield this batch of samples to the model
            yield [[array(Ximages), array(XSeq)], array(y)]

我的model.fit_generator代码:

model.fit_generator(data_generator(texts, train_features, 1, 150), 
                    steps_per_epoch=1500, epochs=50, callbacks=callbacks_list, verbose=1)

任何帮助都会很棒,我正在使用16GB V100特斯拉云计算培训

编辑:我的图像字幕模型为DSL中的每个令牌(250个令牌)创建一个训练样本。有了一个包含50张图像(相当于12500个训练样本)的数据集,并且批次大小为1,我得到了一个OOM。大约有32个(相当于8000个样本,批处理数量为1,它训练得很好。)我的问题是我可以更好地优化代码,还是使用多个GPU的唯一选择?

修复:

Steps_per_epoch必须等于ceil(num_samples / batch_size),因此,如果数据集包含1500个样本,steps_per_epoch应该等于1500。我还将LSTM滑动窗口从48减少到24

steps_per_epoch:整数。总步骤数(一批样品) 在声明一个纪元完成之前从发生器屈服,并且 开始下一个时代。通常应等于 ceil(num_samples / batch_size)。序列的可选:如果未指定, 将使用len(generator)作为许多步骤。

2 个答案:

答案 0 :(得分:0)

生成器已经返回批次。

每个 getResultByRouteParamId(route: ActivatedRouteSnapshot): Observable<Result> { return this.rs.getResult(this.auth.token, route.params['id']); } forkJoinQuizCategoriesAndAccount( result: Result ): Observable<[Category[], Account]> { return forkJoin( this.quizs.getCategoriesQuiz(this.auth.token, result.quizId), this.accs.getAccount(result.userId) ); } forkJoinUserDetailsAndAnswers(results: [Category[], Account]) { return forkJoin( this.accs.getUserDetails(results[1].access_token), this.as.getAnswers(this.auth.token) ); } resolve( route: ActivatedRouteSnapshot, state: RouterStateSnapshot ): Observable<UserResult> { const questionAnswers = Array<QuestionAnswer>(); let result: Result; let res: [Category[], Account]; let res2: [User, Answer[]]; return this.getResultByRouteParamId(route).pipe( tap(resu => result = resu), switchMap((result: Result) => this.forkJoinQuizCategoriesAndAccount(result)), tap(results => (res = results)), switchMap(results => this.forkJoinUserDetailsAndAnswers(results)), tap(results2 => (res2 = results2)), switchMap( // Stuck here! res[0] .forEach(cat => { this.cs.getQuestionsCategory(this.auth.token, cat.id); }) .map(questions => res2[1] .filter(ans => ans.userId === res[1].uid) .forEach(a => { const question = questions.find( (q: Question) => q.id === a.questionId ); if (!isNullOrUndefined(question)) { const category = res[0].find( (c: Category) => c.id === a.categoryId ); const qa = new QuestionAnswer(question, a); qa.category = category.name; questionAnswers.push(qa); } } // let ur = new UserResult(res2[1], result) // ur.questionAnswers = questionAnswers; // return ur; ) ) ) ); 是一批。您完全可以根据自己的需要来设计批处理发生器。

在您的代码中,批处理大小为yield

答案 1 :(得分:0)

这是使用生成器的正确方法:制作一个生成单个基准的生成器。从中创建一个Dataset,然后在该对象上使用batch方法。调整参数以查找不会引起OOM的最大批次大小。

def data_generator(descriptions, features, max_sequence):
    def _gen():
        for img, seq, word in zip(*preprocess_data(descriptions, features, max_sequence)):
            yield {'image': in_img, 'seq': seq}, wo
    return _gen    


ds = tf.data.Dataset.from_generator(
    data_generator(descriptions, features, max_sequence),
    output_types=({'image': tf.float32, 'seq': tf.float32}, tf.int32),
    output_shapes=({
            'image': tf.TensorShape([blah, blah]),
            'seq': tf.TensorShape([blah, blah]),
        },
        tf.TensorShape([balh])
    )
)

ds = ds.batch(n_step)