Question

我试图在tensorflow中连接两个数据集（我正在使用tf.data.Dataset api）。下面是我的代码：

with tf.name_scope('TFRecords_1'):
    # Size of filenames_1 is 34
    for file_name in filenames_1:
        dataset_1 = tf.data.TFRecordDataset(file_name).map(_parse_function)
        datasets_1_all.append(dataset_1)

    dataset_1_all = tf.data.Dataset.zip(tuple(datasets_1_all))\
        .shuffle(10000, seed=mseed)\
        .batch(batch_size)

with tf.name_scope('TFRecords_2'):
    # Size of filenames_2 is 14
    for file_name in filenames_2:
        dataset_2 = tf.data.TFRecordDataset(file_name).map(_parse_function)
        datasets_2_all.append(dataset_2)

    dataset_2_all = tf.data.Dataset.zip(tuple(datasets_2_iterators)).batch(batch_size)

with tf.name_scope('TFRecordsAll'):
    dataset = dataset_1_all.concatenate(dataset_2_all)
    iterator = dataset.make_initializable_iterator()

我遇到以下错误：

File "C:\Users\iayou005\AppData\Local\Continuum\anaconda3\lib\site-packages\tensorflow\python\data\util\nest.py", line 188, in assert_same_structure
    % (nest1, nest2))
ValueError: The two structures don't have the same number of elements. First structure: ((tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8)), second structure: ((tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8)).

我认为答案应该受到https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset#concatenate的启发；甚至认为a和b具有不同数量的元素，但是代码应该可以工作。可以在此处看到示例：Is there a way to stack two tensorflow datasets?;即使我们将70替换为90；代码仍然有效。那么我的代码为什么在上面不起作用？

旁注：在打印出dataset_1_all和dataset_2_all之后；我得到以下信息：

<BatchDataset shapes: ((?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,)), types: (tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string)>

<BatchDataset shapes: ((?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,)), types: (tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string)>

因此，我现在了解形状的不同，但是如何解决此问题？ 可能，如何合并2个数据集？像这样的方法：tf.data.Dataset.merge（）???

非常感谢您的帮助！

如何在张量流中串联两个具有不同元素数量的数据集？

0 个答案: