Keras数据生成器“检查输入时出错:预期input_8具有4个维度,但数组的形状为()”

时间:2019-07-05 01:22:11

标签: python tensorflow keras

我正在Keras中使用数据生成器来训练具有大型数据集的模型。但是我在第一个时期的最后一批每次都收到错误Error when checking input: expected input_8 to have 4 dimensions, but got array with shape ()。但是我检查了我的数据集文件,它没有空数组,那么空数组是怎么来的呢?我什至尝试在生成数组时打印该数组,但很少显示为空。这是我的数据生成器代码:

class data_generator(Sequence):
    def __init__(self,data_file,type_data,batch_size,shuffle=True):
        self.data_file = data_file
        self.type_data = type_data

        self.batch_size = batch_size
        self.shuffle = shuffle
        self.on_epoch_end()

    def on_epoch_end(self):
      if self.type_data == "train":
        self.indices = np.arange(3450000)
      else:
        self.indices = np.arange(345000)
        if self.shuffle:
            np.random.shuffle(self.indices)

    def __data__generation(self,indices):

        return X,Y        

    def __len__(self):
      if self.type_data == "train":
        return int(np.ceil(10000 / float(self.batch_size)))
      else:
        return int(np.ceil(1000 / float(self.batch_size)))

    def __getitem__(self,index):
        #print(self.indices[(index)*self.batch_size], self.indices[(index+1)*self.batch_size])
        X = np.array(HDF5Matrix(self.data_file, self.type_data + "_X", start = self.indices[index*self.batch_size], end = self.indices[(index+1)*self.batch_size]))
        Y = np.array(HDF5Matrix(self.data_file, self.type_data + "_Y", start = self.indices[index*self.batch_size], end = self.indices[(index+1)*self.batch_size]))
        #print(X.shape, Y.shape)
        return X,Y 

这是我启动拟合发生器的代码:

train_generator = data_generator("drive/My Drive/Dataset/dataset.h5", "train", 20)
eval_generator = data_generator("drive/My Drive/Dataset/dataset.h5", "eval", 20)
model = create_model()
history = model.fit_generator(generator = train_generator,epochs = 100,validation_data=eval_generator,use_multiprocessing=False)

如何解决此问题?数据生成器是否还有其他选择可用于大型数据集的训练?数据生成器非常容易出错,并且会产生很多错误。

2 个答案:

答案 0 :(得分:0)

该代码几乎没有错误。我更改了它,现在它可以正常工作了,但是仍然不知道为什么会发生该错误。这是新代码:

class data_generator(Sequence):
    def __init__(self,data_file,type_data,batch_size,shuffle=True):
        self.data_file = data_file
        self.type_data = type_data

        self.batch_size = batch_size
        self.shuffle = shuffle
        self.on_epoch_end()

    def on_epoch_end(self):
      if self.type_data == "train":
        self.indices = np.arange(3450000)
      else:
        self.indices = np.arange(345000)
      if self.shuffle:
         np.random.shuffle(self.indices)

    def __data__generation(self,indices):
      X = []
      Y = []
      for index in indices:
        X.append(np.array(HDF5Matrix(self.data_file, self.type_data + "_X", start = index, end = index + 1)[0]))
        Y.append(np.array(HDF5Matrix(self.data_file, self.type_data + "_Y", start = index, end = index + 1)[0]))
      X = np.array(X)
      Y = np.array(Y)
      return X,Y        

    def __len__(self):
      if self.type_data == "train":
        return int(np.ceil(3450000 / float(self.batch_size)))
      else:
        return int(np.ceil(345000 / float(self.batch_size)))

    def __getitem__(self,index):
        indices = self.indices[index*self.batch_size:(index+1)*self.batch_size]
        X, Y = self.__data__generation(indices)
        #print(X.shape, Y.shape, index)
        return X,Y

答案 1 :(得分:0)

Keras需要为true(无限循环)以避免StopIteration。但是在普通生成器中,经过正确的steps_per_epoch(sample_size // batch_size)后,形状将为零。