Numpy会给出意想不到的结果吗?

时间:2017-08-10 10:33:30

标签: python numpy

我有一个数据集XX.shape会产生(10000, 9)。我想使用以下代码选择X的子集:

X = np.asarray(np.random.normal(size = (10000,9)))
train_fraction = 0.7 # fraction of X that will be marked as train data
train_size = int(X.shape[0]*train_fraction) # fraction converted to number
test_size = X.shape[0] - train_size # remaining rows will be marked as test data
train_ind = np.asarray([False]*X.shape[0])     
train_ind[np.random.randint(low = X.shape[0], size = (train_size,))] = True # mark True at 70% of the places

问题是np.sum(train_ind)不是7000的预期值。相反,它会提供5033等随机值。

我最初认为np.random.randint(low = X.shape[0], size = (train_size,))可能是罪魁祸首。但是当我np.random.randint(low = X.shape[0], size = (train_size,)).shape时,我得到(7000,)

我哪里错了?

1 个答案:

答案 0 :(得分:1)

选择np.random.choice(np.arange(0,X.shape[0]), size = train_size, replace = False)

问题是,np.random.randint不会被注射,基本上数字1可能会出现两次。这意味着索引1将设置为True两次,而另一个则不会设置为True

np.random.choice函数确保每个数字最多只出现一次(如果设置replace = False