我试图在tensorflow中连接两个数据集(我正在使用tf.data.Dataset
api)。下面是我的代码:
with tf.name_scope('TFRecords_1'):
# Size of filenames_1 is 34
for file_name in filenames_1:
dataset_1 = tf.data.TFRecordDataset(file_name).map(_parse_function)
datasets_1_all.append(dataset_1)
dataset_1_all = tf.data.Dataset.zip(tuple(datasets_1_all))\
.shuffle(10000, seed=mseed)\
.batch(batch_size)
with tf.name_scope('TFRecords_2'):
# Size of filenames_2 is 14
for file_name in filenames_2:
dataset_2 = tf.data.TFRecordDataset(file_name).map(_parse_function)
datasets_2_all.append(dataset_2)
dataset_2_all = tf.data.Dataset.zip(tuple(datasets_2_iterators)).batch(batch_size)
with tf.name_scope('TFRecordsAll'):
dataset = dataset_1_all.concatenate(dataset_2_all)
iterator = dataset.make_initializable_iterator()
我遇到以下错误:
File "C:\Users\iayou005\AppData\Local\Continuum\anaconda3\lib\site-packages\tensorflow\python\data\util\nest.py", line 188, in assert_same_structure
% (nest1, nest2))
ValueError: The two structures don't have the same number of elements. First structure: ((tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8)), second structure: ((tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8)).
我认为答案应该受到https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset#concatenate的启发;甚至认为a
和b
具有不同数量的元素,但是代码应该可以工作。可以在此处看到示例:Is there a way to stack two tensorflow datasets?;即使我们将70
替换为90
;代码仍然有效。那么我的代码为什么在上面不起作用?
旁注:在打印出dataset_1_all
和dataset_2_all
之后;我得到以下信息:
<BatchDataset shapes: ((?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,)), types: (tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string)>
<BatchDataset shapes: ((?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,)), types: (tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string)>
因此,我现在了解形状的不同,但是如何解决此问题? 可能,如何合并2个数据集?像这样的方法:tf.data.Dataset.merge()???
非常感谢您的帮助!