IndexError:索引超出了轴0的大小

时间:2016-10-07 14:03:39

标签: python arrays list numpy machine-learning

我有数组x_traintargets_train。我想将训练数据混洗并将其分成较小的批次,并将批次用作训练数据。我的原始数据有1000行,每次我尝试使用250行:

    x_train = np.memmap('/home/usr/train', dtype='float32', mode='r', shape=(1000, 1, 784))
# print(x_train)
targets_train = np.memmap('/home/usr/train_label', dtype='int32', mode='r', shape=(1000, 1))
train_idxs = [i for i in range(x_train.shape[0])]
np.random.shuffle(train_idxs)


num_batches_train = 4
def next_batch(start, train, labels, batch_size=250):
    newstart = start + batch_size
    if newstart > train.shape[0]:
        newstart = 0
    idxs = train_idxs[start:start + batch_size]
    # print(idxs)
    return train[idxs, :], labels[idxs, :], newstart


# x_train_lab = x_train[:200]
# # x_train = np.array(targets_train)
# targets_train_lab = targets_train[:200]
for i in range(num_batches_train):
    x_train, targets_train, newstart = next_batch(i*batch_size, x_train, targets_train, batch_size=250)

问题是,当我随机播放训练数据并尝试访问批次时,我收到错误消息:

    return train[idxs, :], labels[idxs, :], newstart
    IndexError: index 250 is out of bounds for axis 0 with size 250

有谁知道我做错了什么?

2 个答案:

答案 0 :(得分:1)

(编辑 - 首先猜测删除了newstart

在这一行:

x_train, targets_train, newstart = next_batch(i*batch_size, x_train, targets_train, batch_size=250)

每次迭代都会更改x_train的大小,但是您继续使用为完整大小数组创建的train_idxs数组。

批量从x_train中提取随机值是一回事,但您必须保持选择数组的一致性。

由于缺乏最小的,可验证的例子,这个问题可能应该已经结束。令人沮丧的是必须猜测并制作一个小的可测试的例子,希望能够重现这个问题。

https://stackoverflow.com/help/mcve

如果我目前的猜测是错误的,只需要几个中间打印报表就可以解决问题。

========================

将代码缩减为简单的案例

import numpy as np
x_train = np.arange(20).reshape(20,1)
train_idxs = np.arange(x_train.shape[0])
np.random.shuffle(train_idxs)

num_batches_train = 4
batch_size=5
def next_batch(start, train):
    idxs = train_idxs[start:start + batch_size]
    print(train.shape, idxs)
    return train[idxs, :]

for i in range(num_batches_train):
    x_train = next_batch(i*batch_size, x_train)
    print(x_train)

跑步产生:

1658:~/mypy$ python3 stack39919181.py 
(20, 1) [ 7 18  3  0  9]
[[ 7]
 [18]
 [ 3]
 [ 0]
 [ 9]]
(5, 1) [13  5  2 15  1]
Traceback (most recent call last):
  File "stack39919181.py", line 14, in <module>
    x_train = next_batch(i*batch_size, x_train)
  File "stack39919181.py", line 11, in next_batch
    return train[idxs, :]
IndexError: index 13 is out of bounds for axis 0 with size 5

我将(5,1)x_train反馈回next_batch,但尝试将其编入索引,就好像它是原始的一样。

将迭代更改为:

for i in range(num_batches_train):
    x_batch = next_batch(i*batch_size, x_train)
    print(x_batch)

让它可以生成4批5行。

答案 1 :(得分:0)

问题在于函数定义中的这一行:

idxs = train_idxs[start:start + batch_size]

将其更改为:

idxs = train_idxs[start: newstart]

然后它应该按预期工作!

另外,请将for循环中的变量名更改为:

batch_size = 250
for i in range(num_batches_train):
    x_train_split, targets_train_split, newstart = next_batch(i*batch_size, 
                                                              x_train,
                                                              targets_train,
                                                              batch_size=250)
    print(x_train_split.shape, targets_train_split.shape, newstart)

示例输出:

(250, 1, 784) (250, 1) 250
(250, 1, 784) (250, 1) 500
(250, 1, 784) (250, 1) 750
(250, 1, 784) (250, 1) 1000