运行tensorflow-gpu

时间:2019-09-03 12:15:12

标签: python tensorflow

作为大学项目的一部分,我已经为Galaxy Zoo项目创建了CNN模型。我已经使用批处理生成器创建了训练和验证数据,因为数据集非常大。我通过CPU毫无问题地对其进行了培训。最近,我切换到Tensorflow-gpu并通过GPU对其进行了训练,而GPU的速度是指数级的。但是当模型到达第33个时期时,它会意外停止。我已经链接了代码。请帮忙。

我尝试了10个纪元,20个以此类推。它可以正常工作到第32个纪元,但在完成第33个纪元时便会停止。

#Code for Image processor and Batch generator:

def img_processor(path):
    img = cv2.imread(path)
    img = img[106:106*3,106:106*3,:]
    img = cv2.resize(img,(106,106), interpolation = cv2.INTER_CUBIC)
    return img

def Batch_generator(DIR):
    for img in os.listdir(DIR):
        im = img_processor(os.path.join(DIR,img))
        ind = int(img[:-4])
        y_train = df.loc[ind].values
        X_train = np.array(im)
        X_train = X_train.reshape(1,106,106,3)
        y_train = y_train.reshape(1,37)
        yield(X_train,y_train)

def Validation_generator(DIR):
    for img in os.listdir(DIR):
        im = img_processor(os.path.join(DIR,img))
        ind = int(img[:-4])
        y_valid = df.loc[ind].values
        X_valid = np.array(im)
        X_valid = X_valid.reshape(1,106,106,3)
        y_valid = y_valid.reshape(1,37)
        yield(X_valid,y_valid)


#Fitting the model:
batch_size = 32

steps_per_batch = int(len(os.listdir(train_path))/batch_size)
val_steps_per_batch = int(len(os.listdir(validate_path))/batch_size)

history = model.fit_generator(
                    Batch_generator(train_path),
                    samples_per_epoch=steps_per_batch,
                    epochs = 70,
                    validation_data=Validation_generator(validate_path),
                    nb_val_samples=val_steps_per_batch,
                    verbose = 1,
                    callbacks = [tensorboard]
                   )

问题:

Epoch 31/70
1924/1924 [==============================] - 173s 90ms/step - loss: 0.0220 - acc: 0.5541 - val_loss: 0.0196 - val_acc: 0.4969

Epoch 32/70
1924/1924 [==============================] - 173s 90ms/step - loss: 0.0217 - acc: 0.5660 - val_loss: 0.0187 - val_acc: 0.5825

Epoch 33/70
  10/1924 [..............................] - ETA: 2:20 - loss: 0.0270 - acc: 0.3000
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-9-fccdb670230f> in <module>
      6                     nb_val_samples=val_steps_per_batch,
      7                     verbose = 1,
----> 8                     callbacks = [tensorboard]
      9                    )

c:\users\shinigami shrek\appdata\local\programs\python\python36\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name + '` call to the ' +
     90                               'Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

c:\users\shinigami shrek\appdata\local\programs\python\python36\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1416             use_multiprocessing=use_multiprocessing,
   1417             shuffle=shuffle,
-> 1418             initial_epoch=initial_epoch)
   1419 
   1420     @interfaces.legacy_generator_methods_support

c:\users\shinigami shrek\appdata\local\programs\python\python36\lib\site-packages\keras\engine\training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    179             batch_index = 0
    180             while steps_done < steps_per_epoch:
--> 181                 generator_output = next(output_generator)
    182 
    183                 if not hasattr(generator_output, '__len__'):

StopIteration:

0 个答案:

没有答案