为什么keras中的自定义图像生成器会给出错误“对象不能解释为整数”?

时间:2019-08-29 16:20:07

标签: python tensorflow keras image-generation

我在keras中使用了template的自定义图像生成器,因此可以将hdf5文件用作输入。最初,代码给出了“形状”错误,因此在this post之后仅包含 <div id="calculator"> <div class="wind"> <input class="res" value=" " maxlength="24"> <input class="res2" value=" " maxlength="24"> <div class="numbers"> <button class="one" value="1">1</button> <button class="two" value="2">2</button> <button class="three" value="3">3</button> <br> <button class="four" value="4">4</button> <button class="five" value="5">5</button> <button class="six" value="6">6</button> <br> <button class="seven" value="7">7</button> <button class="eight" value="8">8</button> <button class="nine" value="9">9</button> <br> <button class="zero" value="0">0</button> <button class="dot" value=".">.</button> </div> <div class="operators"> <button class="plus">+</button> <button class="subtract">-</button> <button class="divide">/</button> <button class="multiply">*</button> <button class="equal">=</button> </div> </div> 。现在,我以这种形式使用它,正如您在我的colab notebook中所看到的:

from tensorflow.python.keras.utils.data_utils import Sequence

然后我用以下命令调用生成器:

from numpy.random import uniform, randint
from tensorflow.python.keras.preprocessing.image import ImageDataGenerator
import numpy as np
from tensorflow.python.keras.utils.data_utils import Sequence

class CustomImagesGenerator(Sequence):
    def __init__(self, x, zoom_range, shear_range, rescale, horizontal_flip, batch_size):
        self.x = x
        self.zoom_range = zoom_range
        self.shear_range = shear_range
        self.rescale = rescale
        self.horizontal_flip = horizontal_flip
        self.batch_size = batch_size
        self.__img_gen = ImageDataGenerator()
        self.__batch_index = 0

    def __len__(self):
        # steps_per_epoch, if unspecified, will use the len(generator) as a number of steps.
        # hence this
        return np.floor(self.x.shape[0]/self.batch_size)

    # @property
    # def shape(self):
    #     return self.x.shape

    def next(self):
        return self.__next__()

    def __next__(self):
        start = self.__batch_index*self.batch_size
        stop = start + self.batch_size
        self.__batch_index += 1
        if stop > len(self.x):
            raise StopIteration
        transformed = np.array(self.x[start:stop])  # loads from hdf5
        for i in range(len(transformed)):
            zoom = uniform(self.zoom_range[0], self.zoom_range[1])
            transformations = {
                'zx': zoom,
                'zy': zoom,
                'shear': uniform(-self.shear_range, self.shear_range),
                'flip_horizontal': self.horizontal_flip and bool(randint(0,2))
            }
            transformed[i] = self.__img_gen.apply_transform(transformed[i], transformations)
        import pdb;pdb.set_trace()
        return transformed * self.rescale

导入import h5py import tables in_hdf5_file = tables.open_file("gdrive/My Drive/Colab Notebooks/dataset.hdf5", mode='r') images = in_hdf5_file.root.train_img my_gen = CustomImagesGenerator( images, zoom_range=[0.8, 1], batch_size=32, shear_range=6, rescale=1./255, horizontal_flip=False ) classifier.fit_generator(my_gen, steps_per_epoch=100, epochs=1, verbose=1) 解决了“形状”错误,但是现在出现了错误:

  

线程Thread-5中的异常:追溯(最近一次调用为最后一次):
  _bootstrap_inner中的文件“ /usr/lib/python3.6/threading.py”,行916       self.run()运行中的文件“ /usr/lib/python3.6/threading.py”,行864       self._target(* self._args,** self._kwargs)文件“ /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/data_utils.py”,   _run中的第742行       sequence = list(range(len(self.sequence)))TypeError:“ numpy.float64”对象无法解释为整数

我该如何解决?我怀疑在keras软件包中可能再次出现冲突,并且不知道如何解决。

1 个答案:

答案 0 :(得分:1)

在您的情况下使用model.fit()的示例:

from tensorflow.keras.utils import to_categorical
import tensorflow as tf
import tables

#define your model

...

#load your data from an hdf5 file
in_hdf5_file = tables.open_file("path/to/your/dataset.hdf5", mode='r')
x = in_hdf5_file.root.train_img[:]
y = in_hdf5_file.root.train_labels[:]

yourModel.fit(x, to_categorical(y, 3), epochs=2, batch_size=5)

有关更多信息,请阅读我对原始帖子的评论,或者随时提问。

编辑:我修复了您的生成器,现在它只需要您的hdf5文件的路径

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import *
from tensorflow.keras.utils import to_categorical

import numpy as np
from tensorflow.python.keras.utils.data_utils import Sequence
import tensorflow as tf

import tables

#define your model

...

#training
def h5data_generator(path, batch_size=1):
    batch_index = 0
    while 1:
        with tables.open_file(path, mdoe='r') as f:
            x = f.root.train_img[batch_index:batch_index + batch_size]
            y = f.root.train_labels[batch_index:batch_index + batch_size]

            if batch_index >= x.shape[0]:
                batch_index = 0

            batch_index += 1

            yield (x, to_categorical(y, 3))

            del x
            del y


my_gen = h5data_generator("path/to/your/dataset.hdf5")

yourModel.fit_generator(my_gen, steps_per_epoch=100, epochs=20, verbose=1)

生成器的问题是在步进上输出了错误的数据,它没有输出(x, y),没有办法,它正在输出x(在您的情况下为图像),这也是因为它正在使用Sequential keras尝试将其解释为使用了api的对象(生成器中不是这种情况)。同样,也不必是class,它必须是a python generator,如keras中的一个例子所示,它是self(文档字符串fit_generator()),

fit_generator.__doc__

Fits the model on data yielded batch-by-batch by a Python generator.

    The generator is run in parallel to the model, for efficiency.
    For instance, this allows you to do real-time data augmentation
    on images on CPU in parallel to training your model on GPU.

    The use of `keras.utils.Sequence` guarantees the ordering
    and guarantees the single use of every input per epoch when
    using `use_multiprocessing=True`.

    Arguments:
        generator: A generator or an instance of `Sequence`
          (`keras.utils.Sequence`)
            object in order to avoid duplicate data
            when using multiprocessing.
            The output of the generator must be either
            - a tuple `(inputs, targets)`
            - a tuple `(inputs, targets, sample_weights)`.
            This tuple (a single output of the generator) makes a single batch.
            Therefore, all arrays in this tuple must have the same length (equal
            to the size of this batch). Different batches may have different
              sizes.
            For example, the last batch of the epoch is commonly smaller than
              the
            others, if the size of the dataset is not divisible by the batch
              size.
            The generator is expected to loop over its data
            indefinitely. An epoch finishes when `steps_per_epoch`
            batches have been seen by the model.
        steps_per_epoch: Total number of steps (batches of samples)
            to yield from `generator` before declaring one epoch
            finished and starting the next epoch. It should typically
            be equal to the number of samples of your dataset
            divided by the batch size.
            Optional for `Sequence`: if unspecified, will use
            the `len(generator)` as a number of steps.
        epochs: Integer, total number of iterations on the data.
        verbose: Verbosity mode, 0, 1, or 2.
        callbacks: List of callbacks to be called during training.
        validation_data: This can be either
            - a generator for the validation data
            - a tuple (inputs, targets)
            - a tuple (inputs, targets, sample_weights).
        validation_steps: Only relevant if `validation_data`
            is a generator. Total number of steps (batches of samples)
            to yield from `generator` before stopping.
            Optional for `Sequence`: if unspecified, will use
            the `len(validation_data)` as a number of steps.
        validation_freq: Only relevant if validation data is provided. Integer
            or `collections.Container` instance (e.g. list, tuple, etc.). If an
            integer, specifies how many training epochs to run before a new
            validation run is performed, e.g. `validation_freq=2` runs
            validation every 2 epochs. If a Container, specifies the epochs on
            which to run validation, e.g. `validation_freq=[1, 2, 10]` runs
            validation at the end of the 1st, 2nd, and 10th epochs.
        class_weight: Dictionary mapping class indices to a weight
            for the class.
        max_queue_size: Integer. Maximum size for the generator queue.
            If unspecified, `max_queue_size` will default to 10.
        workers: Integer. Maximum number of processes to spin up
            when using process-based threading.
            If unspecified, `workers` will default to 1. If 0, will
            execute the generator on the main thread.
        use_multiprocessing: Boolean.
            If `True`, use process-based threading.
            If unspecified, `use_multiprocessing` will default to `False`.
            Note that because this implementation relies on multiprocessing,
            you should not pass non-picklable arguments to the generator
            as they can't be passed easily to children processes.
        shuffle: Boolean. Whether to shuffle the order of the batches at
            the beginning of each epoch. Only used with instances
            of `Sequence` (`keras.utils.Sequence`).
            Has no effect when `steps_per_epoch` is not `None`.
        initial_epoch: Epoch at which to start training
            (useful for resuming a previous training run)

    Returns:
        A `History` object.

    Example:

    ```python
        def generate_arrays_from_file(path):
            while 1:
                f = open(path)
                for line in f:
                    # create numpy arrays of input data
                    # and labels, from each line in the file
                    x1, x2, y = process_line(line)
                    yield ({'input_1': x1, 'input_2': x2}, {'output': y})
                f.close()

        model.fit_generator(generate_arrays_from_file('/my_file.txt'),
                            steps_per_epoch=10000, epochs=10)
    ```
    Raises:
        ValueError: In case the generator yields data in an invalid format.

有关更多信息,请查看keras的github页面fit_generator() to be exact,或者再次询问。

编辑2:您还可以将batch_size传递到h5data_generator(),这将设置从数据集中提取数据的批量大小。