我正在尝试实现ESPCN(https://arxiv.org/abs/1609.05158)的TensorFlow 2.0实现,当我在将硬件加速器设置为TPU的Google Colab上运行代码时遇到此错误:
2019-10-24 06:18:29.040953: E tensorflow/core/framework/dataset.cc:76] The Encode() method is not implemented for DatasetVariantWrapper objects.
Traceback (most recent call last):
File "train.py", line 64, in <module>
train_dataset = tpu_strategy.experimental_distribute_dataset(train_dataset)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 674, in experimental_distribute_dataset
return self._extended._experimental_distribute_dataset(dataset) # pylint: disable=protected-access
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/tpu_strategy.py", line 256, in _experimental_distribute_dataset
split_batch_by=self._num_replicas_in_sync)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_lib.py", line 81, in get_distributed_dataset
input_context=input_context)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_lib.py", line 558, in __init__
input_context=input_context)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_lib.py", line 520, in __init__
cloned_dataset, len(input_workers.worker_devices), i)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/input_ops.py", line 49, in auto_shard_dataset
return distribute._AutoShardDataset(dataset, num_shards, index)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/experimental/ops/distribute.py", line 56, in __init__
**self._flat_structure)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_experimental_dataset_ops.py", line 171, in auto_shard_dataset
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:worker/replica:0/task:0/device:CPU:0 in order to run AutoShardDataset: Unable to parse tensor proto
Additional GRPC error information:
{"created":"@1571891825.075392283","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"Unable to parse tensor proto","grpc_status":3} [Op:AutoShardDataset]
2019-10-24 04:37:05.421592: E tensorflow/core/distributed_runtime/rpc/eager/grpc_eager_client.cc:72] Remote EagerContext with id 7626715715211053942 does not seem to exist.
数据集是通过此函数构造的
def get_training_set(upscale_factor):
root_dir = download_bsd300()
train_dir = join(root_dir, "train/*.jpg")
names = tf.data.Dataset.list_files(train_dir)
images = names.map(get_image_from_file)
return images
和get_image_from_file
函数:
def get_image_from_file(filename, crop_size=256):
image = tf.io.read_file(filename)
image = tf.image.decode_jpeg(image)
image= tf.cast(image, tf.float32)
image_height = tf.shape(image)[0]
image_width = tf.shape(image)[1]
offset_height = (image_height-crop_size) // 2
offset_width = (image_width-crop_size) // 2
original_image = tf.image.crop_to_bounding_box(image, offset_height, offset_width, crop_size, crop_size)
downsampled_image = tf.image.resize(original_image, [crop_size // 2, crop_size // 2])
# convert to 0~1 and change HWC to CHW
# (Because the network accepts single channel.
# The network will reshape NCHW input to (NC)*H*W*1.)
original_image = tf.transpose(original_image / 255.0, [2, 0, 1])
downsampled_image = tf.transpose(downsampled_image / 255.0, [2, 0, 1])
return downsampled_image, original_image
没有experimental_distribute_dataset
函数的情况下,数据集可以完美运行,所以我猜代码转换时出了点问题。但是,由于我是TPU的新手,所以很难找到原因……谁能给我一些帮助?
我创建了一个github存储库,并将其克隆到Colab上以运行它。
在此先感谢您,我的英语不好。