如何在张量流中串联两个具有不同元素数量的数据集?

时间:2019-01-22 20:55:37

标签: tensorflow concatenation tensorflow-datasets

我试图在tensorflow中连接两个数据集(我正在使用tf.data.Dataset api)。下面是我的代码:

with tf.name_scope('TFRecords_1'):
    # Size of filenames_1 is 34
    for file_name in filenames_1:
        dataset_1 = tf.data.TFRecordDataset(file_name).map(_parse_function)
        datasets_1_all.append(dataset_1)

    dataset_1_all = tf.data.Dataset.zip(tuple(datasets_1_all))\
        .shuffle(10000, seed=mseed)\
        .batch(batch_size)

with tf.name_scope('TFRecords_2'):
    # Size of filenames_2 is 14
    for file_name in filenames_2:
        dataset_2 = tf.data.TFRecordDataset(file_name).map(_parse_function)
        datasets_2_all.append(dataset_2)

    dataset_2_all = tf.data.Dataset.zip(tuple(datasets_2_iterators)).batch(batch_size)

with tf.name_scope('TFRecordsAll'):
    dataset = dataset_1_all.concatenate(dataset_2_all)
    iterator = dataset.make_initializable_iterator()

我遇到以下错误:

File "C:\Users\iayou005\AppData\Local\Continuum\anaconda3\lib\site-packages\tensorflow\python\data\util\nest.py", line 188, in assert_same_structure
    % (nest1, nest2))
ValueError: The two structures don't have the same number of elements. First structure: ((tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8)), second structure: ((tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8), (tf.string, tf.int64, tf.float32, tf.float32, tf.float32, tf.float32, tf.int64, tf.uint8)).

我认为答案应该受到https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset#concatenate的启发;甚至认为ab具有不同数量的元素,但是代码应该可以工作。可以在此处看到示例:Is there a way to stack two tensorflow datasets?;即使我们将70替换为90;代码仍然有效。那么我的代码为什么在上面不起作用?

旁注:在打印出dataset_1_alldataset_2_all之后;我得到以下信息:

<BatchDataset shapes: ((?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,)), types: (tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string)>

<BatchDataset shapes: ((?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,), (?,)), types: (tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string, tf.string)>

因此,我现在了解形状的不同,但是如何解决此问题? 可能,如何合并2个数据集?像这样的方法:tf.data.Dataset.merge()???

非常感谢您的帮助!

0 个答案:

没有答案