Question

TensorFlow构建它是存储数据的好方法。例如，这用于在示例中存储MNIST数据：

>>> mnist
<tensorflow.examples.tutorials.mnist.input_data.read_data_sets.<locals>.DataSets object at 0x10f930630>

假设有一个输入和输出numpy数组。

>>> x = np.random.normal(0,1, (100, 10))
>>> y = np.random.randint(0, 2, 100)

如何在tf数据集中转换它们？

我想使用next_batch

等功能

Answer 1

Dataset对象只是MNIST教程的一部分，而不是主要的TensorFlow库。

您可以在此处查看其定义位置：

GitHub Link

构造函数接受一个图像和标签参数，所以大概你可以在那里传递你自己的值。

Answer 2

作为替代方案，您可以使用函数tf.train.batch()创建一批数据，同时取消tf.placeholder的使用。有关详细信息，请参阅文档。

>>> images = tf.constant(X, dtype=tf.float32) # X is a np.array
>>> labels = tf.constant(y, dtype=tf.int32)   # y is a np.array
>>> batch_images, batch_labels = tf.train.batch([images, labels], batch_size=32, capacity=300, enqueue_many=True)

Answer 3

最近，Tensorflow在其数据集api中添加了一个功能来使用numpy数组。有关详细信息，请参阅here。

以下是我从那里复制的片段：

# Load the training data into two NumPy arrays, for example using `np.load()`.
with np.load("/var/data/training_data.npy") as data:
  features = data["features"]
  labels = data["labels"]

# Assume that each row of `features` corresponds to the same row as `labels`.
assert features.shape[0] == labels.shape[0]

features_placeholder = tf.placeholder(features.dtype, features.shape)
labels_placeholder = tf.placeholder(labels.dtype, labels.shape)

dataset = tf.data.Dataset.from_tensor_slices((features_placeholder, labels_placeholder))
# [Other transformations on `dataset`...]
dataset = ...
iterator = dataset.make_initializable_iterator()

sess.run(iterator.initializer, feed_dict={features_placeholder: features,
                                          labels_placeholder: labels})

TensorFlow从numpy数组创建数据集

3 个答案: