如何批量拆分numpy数组?

时间:2015-02-13 19:16:30

标签: python numpy

这听起来很容易,我不知道该怎么做。

我有numpy 2d数组

X = (1783,30)

我希望将它们分批分成64个。我这样编写代码。

batches = abs(len(X) / BATCH_SIZE ) + 1  // It gives 28

我正在尝试批量预测结果。所以我用零填充批次,然后用预测结果覆盖它们。

predicted = []

for b in xrange(batches): 

 data4D = np.zeros([BATCH_SIZE,1,96,96]) #create 4D array, first value is batch_size, last number of inputs
 data4DL = np.zeros([BATCH_SIZE,1,1,1]) # need to create 4D array as output, first value is  batch_size, last number of outputs
 data4D[0:BATCH_SIZE,:] = X[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE,:] # fill value of input xtrain

 #predict
 #print [(k, v[0].data.shape) for k, v in net.params.items()]
 net.set_input_arrays(data4D.astype(np.float32),data4DL.astype(np.float32))
 pred = net.forward()
 print 'batch ', b
 predicted.append(pred['ip1'])

print 'Total in Batches ', data4D.shape, batches
print 'Final Output: ', predicted

但是在最后一批28号中,只有55个元素而不是64个(总元素1783),它给出了

ValueError: could not broadcast input array from shape (55,1,96,96) into shape (64,1,96,96)

对此有什么解决方法?

PS:网络预测要求确切的批量大小为64来预测。

4 个答案:

答案 0 :(得分:4)

我也不太明白你的问题,尤其是X的样子。 如果要创建数组大小相同的子组,请尝试以下操作:

def group_list(l, group_size):
    """
    :param l:           list
    :param group_size:  size of each group
    :return:            Yields successive group-sized lists from l.
    """
    for i in xrange(0, len(l), group_size):
        yield l[i:i+group_size]

答案 1 :(得分:1)

我找到了一种简单的方法来解决批次问题,方法是生成虚拟对象,然后填写必要的数据。

data = np.zeros(batches*BATCH_SIZE,1,96,96)
// gives dummy  28*64,1,96,96

此代码将正好加载64个批量大小的数据。最后一批最后会有虚拟零,但没关系:)

pred = []
for b in batches:
 data4D[0:BATCH_SIZE,:] = data[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE,:]
 pred = net.predict(data4D)
 pred.append(pred)

output =  pred[:1783] // first 1783 slice

最后,我将总共28 * 64的1783个元素切成薄片。这对我有用,但我确信有很多方法。

答案 2 :(得分:0)

data4D[0:BATCH_SIZE,:]应为data4D[b*BATCH_SIZE:b*BATCH_SIZE+BATCH_SIZE, :]

答案 3 :(得分:0)

这可以使用 numpy 的 as_strided 来实现。

from numpy.lib.stride_tricks import as_strided
def batch_data(test, batch_size):
    m,n = test.shape
    S = test.itemsize
    if not batch_size:
        batch_size = m
    count_batches = m//batch_size
    # Batches which can be covered fully
    test_batches = as_strided(test, shape=(count_batches, batch_size, n), strides=(batch_size*n*S,n*S,S)).copy()
    covered = count_batches*batch_size
    if covered < m:
        rest = test[covered:,:]
        rm, rn = rest.shape
        mismatch = batch_size - rm
        last_batch = np.vstack((rest,np.zeros((mismatch,rn)))).reshape(1,-1,n)
        return np.vstack((test_batches,last_batch))
    return test_batches