Question

我正在Keras中使用数据生成器来训练具有大型数据集的模型。但是我在第一个时期的最后一批每次都收到错误Error when checking input: expected input_8 to have 4 dimensions, but got array with shape ()。但是我检查了我的数据集文件，它没有空数组，那么空数组是怎么来的呢？我什至尝试在生成数组时打印该数组，但很少显示为空。这是我的数据生成器代码：

class data_generator(Sequence):
    def __init__(self,data_file,type_data,batch_size,shuffle=True):
        self.data_file = data_file
        self.type_data = type_data

        self.batch_size = batch_size
        self.shuffle = shuffle
        self.on_epoch_end()

    def on_epoch_end(self):
      if self.type_data == "train":
        self.indices = np.arange(3450000)
      else:
        self.indices = np.arange(345000)
        if self.shuffle:
            np.random.shuffle(self.indices)

    def __data__generation(self,indices):

        return X,Y        

    def __len__(self):
      if self.type_data == "train":
        return int(np.ceil(10000 / float(self.batch_size)))
      else:
        return int(np.ceil(1000 / float(self.batch_size)))

    def __getitem__(self,index):
        #print(self.indices[(index)*self.batch_size], self.indices[(index+1)*self.batch_size])
        X = np.array(HDF5Matrix(self.data_file, self.type_data + "_X", start = self.indices[index*self.batch_size], end = self.indices[(index+1)*self.batch_size]))
        Y = np.array(HDF5Matrix(self.data_file, self.type_data + "_Y", start = self.indices[index*self.batch_size], end = self.indices[(index+1)*self.batch_size]))
        #print(X.shape, Y.shape)
        return X,Y

这是我启动拟合发生器的代码：

train_generator = data_generator("drive/My Drive/Dataset/dataset.h5", "train", 20)
eval_generator = data_generator("drive/My Drive/Dataset/dataset.h5", "eval", 20)
model = create_model()
history = model.fit_generator(generator = train_generator,epochs = 100,validation_data=eval_generator,use_multiprocessing=False)

如何解决此问题？数据生成器是否还有其他选择可用于大型数据集的训练？数据生成器非常容易出错，并且会产生很多错误。

Answer 1

该代码几乎没有错误。我更改了它，现在它可以正常工作了，但是仍然不知道为什么会发生该错误。这是新代码：

class data_generator(Sequence):
    def __init__(self,data_file,type_data,batch_size,shuffle=True):
        self.data_file = data_file
        self.type_data = type_data

        self.batch_size = batch_size
        self.shuffle = shuffle
        self.on_epoch_end()

    def on_epoch_end(self):
      if self.type_data == "train":
        self.indices = np.arange(3450000)
      else:
        self.indices = np.arange(345000)
      if self.shuffle:
         np.random.shuffle(self.indices)

    def __data__generation(self,indices):
      X = []
      Y = []
      for index in indices:
        X.append(np.array(HDF5Matrix(self.data_file, self.type_data + "_X", start = index, end = index + 1)[0]))
        Y.append(np.array(HDF5Matrix(self.data_file, self.type_data + "_Y", start = index, end = index + 1)[0]))
      X = np.array(X)
      Y = np.array(Y)
      return X,Y        

    def __len__(self):
      if self.type_data == "train":
        return int(np.ceil(3450000 / float(self.batch_size)))
      else:
        return int(np.ceil(345000 / float(self.batch_size)))

    def __getitem__(self,index):
        indices = self.indices[index*self.batch_size:(index+1)*self.batch_size]
        X, Y = self.__data__generation(indices)
        #print(X.shape, Y.shape, index)
        return X,Y

Answer 2

Keras需要为true（无限循环）以避免StopIteration。但是在普通生成器中，经过正确的steps_per_epoch（sample_size // batch_size）后，形状将为零。

Keras数据生成器“检查输入时出错：预期input_8具有4个维度，但数组的形状为（）”

2 个答案: