Question

我正在尝试实现ESPCN（https://arxiv.org/abs/1609.05158）的TensorFlow 2.0实现，当我在将硬件加速器设置为TPU的Google Colab上运行代码时遇到此错误：

2019-10-24 06:18:29.040953: E tensorflow/core/framework/dataset.cc:76] The Encode() method is not implemented for DatasetVariantWrapper objects.
Traceback (most recent call last):
  File "train.py", line 64, in <module>
    train_dataset = tpu_strategy.experimental_distribute_dataset(train_dataset)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 674, in experimental_distribute_dataset
    return self._extended._experimental_distribute_dataset(dataset)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/tpu_strategy.py", line 256, in _experimental_distribute_dataset
    split_batch_by=self._num_replicas_in_sync)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_lib.py", line 81, in get_distributed_dataset
    input_context=input_context)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_lib.py", line 558, in __init__
    input_context=input_context)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_lib.py", line 520, in __init__
    cloned_dataset, len(input_workers.worker_devices), i)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_ops.py", line 49, in auto_shard_dataset
    return distribute._AutoShardDataset(dataset, num_shards, index)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/experimental/ops/distribute.py", line 56, in __init__
    **self._flat_structure)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_experimental_dataset_ops.py", line 171, in auto_shard_dataset
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:worker/replica:0/task:0/device:CPU:0 in order to run AutoShardDataset: Unable to parse tensor proto
Additional GRPC error information:
{"created":"@1571891825.075392283","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"Unable to parse tensor proto","grpc_status":3} [Op:AutoShardDataset]
2019-10-24 04:37:05.421592: E tensorflow/core/distributed_runtime/rpc/eager/grpc_eager_client.cc:72] Remote EagerContext with id 7626715715211053942 does not seem to exist.

数据集是通过此函数构造的

def get_training_set(upscale_factor):
    root_dir = download_bsd300()
    train_dir = join(root_dir, "train/*.jpg")
    names = tf.data.Dataset.list_files(train_dir)
    images = names.map(get_image_from_file)
    return images

和get_image_from_file函数：

def get_image_from_file(filename, crop_size=256):
    image = tf.io.read_file(filename)
    image = tf.image.decode_jpeg(image)
    image= tf.cast(image, tf.float32)
    image_height = tf.shape(image)[0]
    image_width = tf.shape(image)[1]
    offset_height = (image_height-crop_size) // 2
    offset_width = (image_width-crop_size) // 2
    original_image = tf.image.crop_to_bounding_box(image, offset_height, offset_width, crop_size, crop_size)
    downsampled_image = tf.image.resize(original_image, [crop_size // 2, crop_size // 2])
    # convert to 0~1 and change HWC to CHW 
    # (Because the network accepts single channel.
    # The network will reshape NCHW input to (NC)*H*W*1.)
    original_image = tf.transpose(original_image / 255.0, [2, 0, 1])
    downsampled_image = tf.transpose(downsampled_image / 255.0, [2, 0, 1])
    return downsampled_image, original_image

没有experimental_distribute_dataset函数的情况下，数据集可以完美运行，所以我猜代码转换时出了点问题。但是，由于我是TPU的新手，所以很难找到原因……谁能给我一些帮助？

完整的python代码，以防您需要更多检查代码以查找错误：https://gist.github.com/gs18113/5f24d05104d11e4a0d928f8876eaff67

我创建了一个github存储库，并将其克隆到Colab上以运行它。

在此先感谢您，我的英语不好。

Tensorflow 2.0数据集tensorflow.python.framework.errors_impl.InternalError：无法解析张量原型

0 个答案: