`sklearn`方法：

Question

我有一个4D数组，其中包含5500张图像的细节（大小为350x350像素-每个条目均包含单个像素的RBG值），以及一个1d数组，为每个图像分配一个值：

>>> np.array(images).shape
(5500, 350, 350, 3)
>>> np.array(labels).shape
(5500,)

我想将images和labels数组都分为80％的训练和20％的测试分割。我已经使用labels数组完成了此操作，没有任何问题：

>>> indices = np.random.permutation(5500)
>>> training_idx, test_idx = indices[:4400], indices[4400:]
>>> training_idx
array([1209, 1958, 3376, ..., 1875,   55, 5408])
>>> labels_train, labels_test = labels[training_idx], labels[test_idx]
>>> labels_train
array([1.7     , 3.833333, 2.333333, ..., 2.016667, 3.15    , 4.316667])

但是，尝试制作一个images_train（将images 4D数组替换为其值的80％）给我带来了问题。例如，我尝试了以下方法：

images_train, images_test = images[training_idx][:,][:,][:,], images[test_idx][:,][:,][:,]
images_train, images_test = images[training_idx,:,:,:,], images[test_idx,:,:,:,]

Answer 1

`sklearn`方法：

尽管它不是纯粹的numpy，但我建议您研究一下train_test_split from sklearn.model_selection，因为它基本上可以准确地完成您要执行的操作，但是很简单：

from sklearn.model_selection import train_test_split

images_train, images_test, labels_train, labels_test = train_test_split(images, labels, test_size=0.2)

`numpy`方法：

您也可以只在numpy中执行此操作，但是代码不太清楚：

80/20拆分的通用代码：

indices = np.random.permutation(len(images))

images_train, images_test = images[indices[:int(len(images)*0.8)]], images[indices[int(len(images)*0.8):]]

labels_train, labels_test = labels[indices[:int(len(images)*0.8)]], labels[indices[int(len(images)*0.8):]]

使用您的硬编码火车/测试尺寸：

indices = np.random.permutation(5500)

images_train, images_test = images[indices[:4400]], images[indices[4400:]]

labels_train, labels_test = labels[indices[:4400]], labels[indices[4400:]]

将4D阵列细分为训练和测试分割

1 个答案:

`sklearn`方法：

`numpy`方法：

将4D阵列细分为训练和测试分割

1 个答案:

sklearn方法：

numpy方法：

`sklearn`方法：

`numpy`方法：