Tensorflow 2.0数据集tensorflow.python.framework.errors_impl.InternalError:无法解析张量原型

时间:2019-10-24 06:17:54

标签: python tensorflow tensorflow-datasets tensorflow2.0 tpu

我正在尝试实现ESPC​​N(https://arxiv.org/abs/1609.05158)的TensorFlow 2.0实现,当我在将硬件加速器设置为TPU的Google Colab上运行代码时遇到此错误:

2019-10-24 06:18:29.040953: E tensorflow/core/framework/dataset.cc:76] The Encode() method is not implemented for DatasetVariantWrapper objects.
Traceback (most recent call last):
  File "train.py", line 64, in <module>
    train_dataset = tpu_strategy.experimental_distribute_dataset(train_dataset)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 674, in experimental_distribute_dataset
    return self._extended._experimental_distribute_dataset(dataset)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/tpu_strategy.py", line 256, in _experimental_distribute_dataset
    split_batch_by=self._num_replicas_in_sync)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_lib.py", line 81, in get_distributed_dataset
    input_context=input_context)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_lib.py", line 558, in __init__
    input_context=input_context)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_lib.py", line 520, in __init__
    cloned_dataset, len(input_workers.worker_devices), i)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_ops.py", line 49, in auto_shard_dataset
    return distribute._AutoShardDataset(dataset, num_shards, index)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/experimental/ops/distribute.py", line 56, in __init__
    **self._flat_structure)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_experimental_dataset_ops.py", line 171, in auto_shard_dataset
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:worker/replica:0/task:0/device:CPU:0 in order to run AutoShardDataset: Unable to parse tensor proto
Additional GRPC error information:
{"created":"@1571891825.075392283","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"Unable to parse tensor proto","grpc_status":3} [Op:AutoShardDataset]
2019-10-24 04:37:05.421592: E tensorflow/core/distributed_runtime/rpc/eager/grpc_eager_client.cc:72] Remote EagerContext with id 7626715715211053942 does not seem to exist.

数据集是通过此函数构造的

def get_training_set(upscale_factor):
    root_dir = download_bsd300()
    train_dir = join(root_dir, "train/*.jpg")
    names = tf.data.Dataset.list_files(train_dir)
    images = names.map(get_image_from_file)
    return images

get_image_from_file函数:

def get_image_from_file(filename, crop_size=256):
    image = tf.io.read_file(filename)
    image = tf.image.decode_jpeg(image)
    image= tf.cast(image, tf.float32)
    image_height = tf.shape(image)[0]
    image_width = tf.shape(image)[1]
    offset_height = (image_height-crop_size) // 2
    offset_width = (image_width-crop_size) // 2
    original_image = tf.image.crop_to_bounding_box(image, offset_height, offset_width, crop_size, crop_size)
    downsampled_image = tf.image.resize(original_image, [crop_size // 2, crop_size // 2])
    # convert to 0~1 and change HWC to CHW 
    # (Because the network accepts single channel.
    # The network will reshape NCHW input to (NC)*H*W*1.)
    original_image = tf.transpose(original_image / 255.0, [2, 0, 1])
    downsampled_image = tf.transpose(downsampled_image / 255.0, [2, 0, 1])
    return downsampled_image, original_image

没有experimental_distribute_dataset函数的情况下,数据集可以完美运行,所以我猜代码转换时出了点问题。但是,由于我是TPU的新手,所以很难找到原因……谁能给我一些帮助?

我创建了一个github存储库,并将其克隆到Colab上以运行它。

在此先感谢您,我的英语不好。

0 个答案:

没有答案