Question

我正在使用 Talos 和Google colab TPU 对 Keras 模型进行超参数调整。请注意，我正在使用Tensorflow 2.0.0和Keras 2.2.4-tf。

import os
import tensorflow as tf
import talos as ta
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

def iris_model(x_train, y_train, x_val, y_val, params):

    # Specify a distributed strategy to use TPU
    resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
    tf.config.experimental_connect_to_host(resolver.master())
    tf.tpu.experimental.initialize_tpu_system(resolver)
    strategy = tf.distribute.experimental.TPUStrategy(resolver)

    with strategy.scope():
      model = Sequential()
      model.add(Dense(32, input_dim=4, activation=params['activation']))
      model.add(Dense(3, activation='softmax'))
      model.compile(optimizer=params['optimizer'], loss=params['losses'])

    # Convert the train set to a Dataset to use TPU
    dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
    dataset = dataset.cache().shuffle(1000, reshuffle_each_iteration=True).repeat().batch(params['batch_size'], drop_remainder=True)

    out = model.fit(dataset, batch_size=params['batch_size'], epochs=params['epochs'], validation_data=[x_val, y_val], verbose=0)

    return out, model

x, y = ta.templates.datasets.iris()

p = {'activation': ['relu', 'elu'],
       'optimizer': ['Nadam', 'Adam'],
       'losses': ['logcosh'],
       'batch_size': (20, 50, 5),
       'epochs': [10, 20]}

scan_object = ta.Scan(x, y, model=iris_model, params=p, fraction_limit=0.1, experiment_name='first_test')

使用 tf.data.Dataset 将火车集转换为数据集后，使用 out = model.fit 拟合模型时出现以下错误：

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_distributed.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, **kwargs)
    609         validation_split=validation_split)
    610     batch_size = model._validate_or_infer_batch_size(
--> 611         batch_size, steps_per_epoch, x)
    612     dataset = model._distribution_standardize_user_data(
    613         x, y,

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py in _validate_or_infer_batch_size(self, batch_size, steps, x)
   1815             'The `batch_size` argument must not be specified for the given '
   1816             'input type. Received input: {}, batch_size: {}'.format(
-> 1817                 x, batch_size))
   1818       return
   1819 

ValueError: The `batch_size` argument must not be specified for the given input type. Received input: <BatchDataset shapes: ((38, 4), ((38, 3)), types: (tf.float64, tf.float32)>, batch_size: 38

Answer 1

在我看来，您的代码存在的问题是培训验证数据的格式不同。您正在批处理训练数据，而不是验证示例。

您可以通过替换以下内容来确保它们的格式相同 iris_model函数的下半部分与此：

def fix_data(x, y):
    x = x.astype('float32')
    ds = Dataset.from_tensor_slices((x, y))
    ds = ds.cache()
    ds = ds.shuffle(1000, reshuffle_each_iteration = True)
    ds = ds.repeat()
    ds = ds.batch(params['batch_size'], drop_remainder = True)
    return ds
train = fix_data(x_train, y_train)
val = fix_data(x_val, y_val)

# Fit the Keras model on the dataset
out = model.fit(x = train, epochs = params['epochs'],
                steps_per_epoch = 2,
                validation_data = val,
                validation_steps = 2)

至少这对我有用，并且您的代码运行没有错误。

Answer 2

来自github code：

ValueError将是如果x是生成器或Sequence实例并且batch_size是指定，因为我们希望用户能够提供批量数据集。

尝试使用batch_size = None

Answer 3

不确定以下内容是否适合您的账单，但可以尝试一下。我所做的只是从数据集中删除了repeat（），从model.fit中删除了batch_size = params ['batch_size']

如果以上内容不是您愿意牺牲的，请忽略该帖子。

import os
import tensorflow as tf
import talos as ta
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

def iris_model(x_train, y_train, x_val, y_val, params):

    # Specify a distributed strategy to use TPU
    resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
    tf.config.experimental_connect_to_host(resolver.master())
    tf.tpu.experimental.initialize_tpu_system(resolver)
    strategy = tf.distribute.experimental.TPUStrategy(resolver)

    with strategy.scope():
        model = Sequential()
        model.add(Dense(32, input_dim=4, activation=params['activation']))
        model.add(Dense(3, activation='softmax'))
        model.compile(optimizer=params['optimizer'], loss=params['losses'])

    # Convert the train set to a Dataset to use TPU
    dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
    dataset = dataset.cache().shuffle(1000, reshuffle_each_iteration=True).batch(params['batch_size'], drop_remainder=True)

    out = model.fit(dataset, epochs=params['epochs'], validation_data=[x_val, y_val], verbose=0)

    return out, model

x, y = ta.templates.datasets.iris()

p = {'activation': ['relu', 'elu'],
       'optimizer': ['Nadam', 'Adam'],
       'losses': ['logcosh'],
       'batch_size': (20, 50, 5),
       'epochs': [10, 20]}

scan_object = ta.Scan(x, y, model=iris_model, params=p, fraction_limit=0.1, experiment_name='first_test')

Answer 4

如果您没有通过_distribution_standardize_user_data来适应，则会在batch_size中遇到第二个错误。

您正在为该功能运行的代码在这里：

https://github.com/tensorflow/tensorflow/blob/r1.15/tensorflow/python/keras/engine/training.py#L2192

您没有发布追溯，但是我敢打赌它在line 2294上失败了，因为那是batch_size乘以某物的唯一地方。

if shuffle:
          # We want a buffer size that is larger than the batch size provided by
          # the user and provides sufficient randomness. Note that larger
          # numbers introduce more memory usage based on the size of each
          # sample.
          ds = ds.shuffle(max(1024, batch_size * 8))

您似乎可以通过设置shuffle=False将其关闭。

fit(ds, shuffle=False,...)

行得通吗？

Answer 5

您可以从代码中删除这些行，然后尝试：

    dataset = dataset.cache()
    dataset = dataset.shuffle(1000, reshuffle_each_iteration=True).repeat()
    dataset = dataset.batch(params['batch_size'], drop_remainder=True)
WITH THESE:
    dataset = dataset.repeat()
    dataset = dataset.batch(128, drop_remainder=True)
    dataset = dataset.prefetch(1)

否则，您在tf.data.Dataset.from_tensor_slices中写的内容与错误有关。

tf.data.Dataset：不得为给定的输入类型指定`batch_size`参数

5 个答案: