我有两个数据集,比如说dataA(100k图像)和dataB(20k图像)。现在,我要批量从两个数据集中获取图像,同时保持每个批次中图像的比例固定。例如:如果我的批处理大小为32,则我想从dataA中获取10张图像,从dataB中获取22张图像。现在,我编写了用于生成批次的代码,但不确定如何正确获取它们?
数据加载器代码:
def generate_batch(X, y, batch_size):
num_batches = len(X) // batch_size
#while True:
for batchid in range(num_batches):
start = batchid * batch_size
end = (batchid + 1) * batch_size
batchx = 22*[0]#
batchy = 22*[0]#
k = 0
for i in range(start, end):
batchx[k] = X[i]
batchy[k] = y[i]
k = k + 1
yield (batchx) # it's just a sample, I am sending just one value to simplfy it right now.
获取数据:
for k in range(#num_iterations):
for i in generate_batch(Xadata, yadata, batch_size =22):
for j in generate_batch(Xbdata,ybdata, batch_size=10):
print(i)
print(j)
这将修复dataA的生成,直到完成dataB。现在,我希望它们在每次迭代中都应从dataA和dataB中获取新的数据副本。我该如何修改?还是有其他有效的方法吗?