如何在多个数据集上正确实现Keras的fit_generator?

时间:2017-11-21 06:34:07

标签: python tensorflow keras batch-processing conv-neural-network

我在实施Keras的fit_generator功能时遇到了问题。我在线跟踪了Keras文档和许多其他文档。但我似乎无法让这件事发挥作用。

当我运行fit_generator时,它不会抛出错误。我可以告诉我在后台运行的东西,因为我的任务管理器上的GPU使用量猛增至70%。但是,没有文字/详细说明正在为我的卷积神经网络处理批次。

这是我的模特

g <-c("Pla", "Ond","Gra", "Dol","Tro", "Ond+Dex", "Pal","Ram", "Ond+Drop",  "Ond+Met", "Gra+Dex",  "Pal+Dex", "Dol+Dex", "Dol+Drop", "Gran+Drop")
s1<-c(51.9, 64.9, 93.5, 27.7, 35.3, NA, NA, NA, NA, NA, NA, NA, 26.6, NA, NA)
s2<-c(0.8, 25.4, 44.8, 13.3, 23.2, 71.9, 54.9, 51.3, 65.4, 52.8, 81.2, 43.7, 72.8, 76.8, 71.7)
s3<-c(0.1, 20.1, 42.5, 37.7, 16.3, 63, 72.3, 34.9, 76.9, NA, 86.3, 67, NA, 71.9, 61.1)
mydata<-data.frame(g, s1, s2, s3)
rownames(mydata) <- mydata[,1]
mydata <- mydata[,-1]

s <- subplot(
  plot_ly(mydata, x = ~s1, type = "histogram"),
  plotly_empty(mydata),
  plot_ly(mydata, x = ~s1, y = ~s2, z = ~s3, type = "contour"),
  plot_ly(mydata, y = ~s2, type = "histogram"),
  nrows = 2, heights = c(0.2, 0.8), widths = c(0.8, 0.2), margin = 0,
  shareX = TRUE, shareY = TRUE, titleX = FALSE, titleY = FALSE
)
p <- layout(s, showlegend = FALSE)

这是我的批处理生成器

我有六个hdf5文件,我想循环,每个文件包含40,000个图像。它们已经格式化为Numpy数组。我每次都会产生20的批量。

import keras
from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D
from keras.layers import Dropout, Flatten, Dense
from keras.models import Sequential

model = Sequential()

model.add(Conv2D(filters=80, kernel_size=4, strides=1, activation='relu', input_shape=(180, 180, 3)))
model.add(Dropout(rate = 0.2))
model.add(MaxPooling2D(pool_size=2, strides=2))
model.add(Conv2D(filters=60, kernel_size=2, strides=1, activation='relu'))
model.add(Dropout(rate = 0.2))
model.add(MaxPooling2D(pool_size=2, strides=2))
model.add(Dense(units = 40, activation = 'relu'))
model.add(Dense(units = 20, activation = 'relu'))
model.add(Flatten())
model.add(Dense(units=5270, activation='softmax'))

model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=['accuracy'])
model.summary()

适合我的模特

当我的模型与我的发生器匹配时,我知道我的GPU正在处理数据;我有一台NVIDIA GTX 1070.但是下面运行此代码时没有显示详细/文本。我也试过没有GPU运行,但仍然没有运气。我在这里做错了吗?

def train_generator():
    counter = 1
    batch_size = 20

    while True:

        # Create arrays to contain x_train and y_train. There are six of these files in total, so 40000*6 = 240,000 items in the entire training set.
        # 240,000 images for each epoch
        h5f = h5py.File('x_train' + str(counter) + 'catID.h5','r')
        pic_arr = h5f['dataset'][0:40000]

        h5f = h5py.File('y_train' + str(counter) + 'catID.h5','r')
        cat_arr = h5f['dataset'][0:40000]
        h5f.close()

        # Since training size for first dataset is 40,000 and batch_size is 20, loop 2000 times because 40000/20 = 2000 
        for i in range(1,2001):
            if (i == 1):
                x_train = pic_arr[0:batch_size]
                y_train = cat_arr[0:batch_size]

                index = batch_size
                yield (x_train, y_train)
            else:
                x_train = pic_arr[index:index + batch_size]
                y_train = cat_arr[index:index + batch_size]

                index += batch_size
                yield (x_train, y_train)

        del pic_arr
        del cat_arr
        counter += 1

1 个答案:

答案 0 :(得分:1)

没关系。我尝试再次运行相同的代码并且它有效...如果有人需要参考如何实现Keras的fit_generator,则上述工作。