将4D阵列细分为训练和测试分割

时间:2018-08-14 05:14:39

标签: python numpy

我有一个4D数组,其中包含5500张图像的细节(大小为350x350像素-每个条目均包含单个像素的RBG值),以及一个1d数组,为每个图像分配一个值:

>>> np.array(images).shape
(5500, 350, 350, 3)
>>> np.array(labels).shape
(5500,)

我想将imageslabels数组都分为80%的训练和20%的测试分割。我已经使用labels数组完成了此操作,没有任何问题:

>>> indices = np.random.permutation(5500)
>>> training_idx, test_idx = indices[:4400], indices[4400:]
>>> training_idx
array([1209, 1958, 3376, ..., 1875,   55, 5408])
>>> labels_train, labels_test = labels[training_idx], labels[test_idx]
>>> labels_train
array([1.7     , 3.833333, 2.333333, ..., 2.016667, 3.15    , 4.316667])

但是,尝试制作一个images_train(将images 4D数组替换为其值的80%)给我带来了问题。例如,我尝试了以下方法:

images_train, images_test = images[training_idx][:,][:,][:,], images[test_idx][:,][:,][:,]
images_train, images_test = images[training_idx,:,:,:,], images[test_idx,:,:,:,]

1 个答案:

答案 0 :(得分:0)

sklearn方法:

尽管它不是纯粹的numpy,但我建议您研究一下train_test_split from sklearn.model_selection,因为它基本上可以准确地完成您要执行的操作,但是很简单:

from sklearn.model_selection import train_test_split

images_train, images_test, labels_train, labels_test = train_test_split(images, labels, test_size=0.2)

numpy方法:

您也可以只在numpy中执行此操作,但是代码不太清楚:

80/20拆分的通用代码:

indices = np.random.permutation(len(images))

images_train, images_test = images[indices[:int(len(images)*0.8)]], images[indices[int(len(images)*0.8):]]

labels_train, labels_test = labels[indices[:int(len(images)*0.8)]], labels[indices[int(len(images)*0.8):]]

使用您的硬编码火车/测试尺寸:

indices = np.random.permutation(5500)

images_train, images_test = images[indices[:4400]], images[indices[4400:]]

labels_train, labels_test = labels[indices[:4400]], labels[indices[4400:]]