可能有人问这个问题,但我没找到。
从数据集中不断获取批量数据的最简单方法是什么?是否有内置张量流功能?
例如:
for i in num_trains:
x_batch, y_batch = get_batch(x_train, y_train, batch_size)
sess.run(train_step, feed_dict={x:x_batch,y:y_batch})
如果没有这样的内置功能,你会如何实现它?我试过自己,但我无法弄清楚每次调用函数时如何获得与之前版本不同的新批次。
谢谢!
答案 0 :(得分:1)
您可以尝试:
# Feed batch data
def get_batch(inputX, inputY, batch_size):
duration = len(inputX)
for i in range(0,duration//batch_size):
idx = i*batch_size
yield inputX[idx:idx+batch_size], inputY[idx:idx+batch_size]
你也可以使用tensorflow' dataset API
:
dataset = tf.data.Dataset.from_tensor_slices((train_x, train_y))
dataset = dataset.batch(batch_size)
获取批次:
X = np.arange(100)
Y = X
batch = get_batch(X, Y, 5)
batch_x, batch_y = next(batch)
print(batch_x, batch_y)
#[0 1 2 3 4] [0 1 2 3 4]
batch_x, batch_y = next(batch)
print(batch_x, batch_y)
#[5 6 7 8 9] [5 6 7 8 9]
通常,为了遍历多个epochs
的数据集,您可以执行以下操作:
for epoch in range(number of epoch):
for step in range(size_of_dataset//batch_size):
for x_batch, y_batch in get_batch(x_train, y_train, batch_size):
sess.run(train_step, feed_dict={x:x_batch,y:y_batch})
使用dataset API
:
dataset = tf.data.Dataset.from_tensor_slices((X, Y))
dataset = dataset.batch(5)
iterator = dataset.make_initializable_iterator()
train_x, train_y = iterator.get_next()
with tf.Session() as sess:
sess.run(iterator.initializer)
for i in range(2):
print(sess.run([train_x, train_y]))
#[array([0, 1, 2, 3, 4]), array([0, 1, 2, 3, 4])]
#[array([5, 6, 7, 8, 9]), array([5, 6, 7, 8, 9])]