Question

这听起来很容易，我不知道该怎么做。

我有numpy 2d数组

X = (1783,30)

我希望将它们分批分成64个。我这样编写代码。

batches = abs(len(X) / BATCH_SIZE ) + 1  // It gives 28

我正在尝试批量预测结果。所以我用零填充批次，然后用预测结果覆盖它们。

predicted = []

for b in xrange(batches): 

 data4D = np.zeros([BATCH_SIZE,1,96,96]) #create 4D array, first value is batch_size, last number of inputs
 data4DL = np.zeros([BATCH_SIZE,1,1,1]) # need to create 4D array as output, first value is  batch_size, last number of outputs
 data4D[0:BATCH_SIZE,:] = X[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE,:] # fill value of input xtrain

 #predict
 #print [(k, v[0].data.shape) for k, v in net.params.items()]
 net.set_input_arrays(data4D.astype(np.float32),data4DL.astype(np.float32))
 pred = net.forward()
 print 'batch ', b
 predicted.append(pred['ip1'])

print 'Total in Batches ', data4D.shape, batches
print 'Final Output: ', predicted

但是在最后一批28号中，只有55个元素而不是64个（总元素1783），它给出了

ValueError: could not broadcast input array from shape (55,1,96,96) into shape (64,1,96,96)

对此有什么解决方法？

PS：网络预测要求确切的批量大小为64来预测。

Answer 1

我也不太明白你的问题，尤其是X的样子。如果要创建数组大小相同的子组，请尝试以下操作：

def group_list(l, group_size):
    """
    :param l:           list
    :param group_size:  size of each group
    :return:            Yields successive group-sized lists from l.
    """
    for i in xrange(0, len(l), group_size):
        yield l[i:i+group_size]

Answer 2

我找到了一种简单的方法来解决批次问题，方法是生成虚拟对象，然后填写必要的数据。

data = np.zeros(batches*BATCH_SIZE,1,96,96)
// gives dummy  28*64,1,96,96

此代码将正好加载64个批量大小的数据。最后一批最后会有虚拟零，但没关系：）

pred = []
for b in batches:
 data4D[0:BATCH_SIZE,:] = data[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE,:]
 pred = net.predict(data4D)
 pred.append(pred)

output =  pred[:1783] // first 1783 slice

最后，我将总共28 * 64的1783个元素切成薄片。这对我有用，但我确信有很多方法。

Answer 3

data4D[0:BATCH_SIZE,:]应为data4D[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE, :]。

Answer 4

这可以使用 numpy 的 as_strided 来实现。

from numpy.lib.stride_tricks import as_strided
def batch_data(test, batch_size):
    m,n = test.shape
    S = test.itemsize
    if not batch_size:
        batch_size = m
    count_batches = m//batch_size
    # Batches which can be covered fully
    test_batches = as_strided(test, shape=(count_batches, batch_size, n), strides=(batch_size*n*S,n*S,S)).copy()
    covered = count_batches*batch_size
    if covered < m:
        rest = test[covered:,:]
        rm, rn = rest.shape
        mismatch = batch_size - rm
        last_batch = np.vstack((rest,np.zeros((mismatch,rn)))).reshape(1,-1,n)
        return np.vstack((test_batches,last_batch))
    return test_batches

如何批量拆分numpy数组？

4 个答案: