Question

我已经从我的本地目录中读取了如下图像：

from PIL import Image
import os

root = '/Users/xyz/Desktop/data'

for path, subdirs, files in os.walk(root):
    for name in files:
        img_path = os.path.join(path,name)

我有两个子目录：category-1和category-2，每个子目录都包含属于每个类别的图像文件（.jpg）。

如何在Scikit-Learn中使用train_test_split()功能的图像和两个类别？换句话说，安排培训和测试数据？

感谢。

Answer 1

您必须从图像中读取像素数据并将其存储在Pandas DataFrame或numpy数组中。同时，您必须在列表或numpy数组中存储相应的类别值category-1 (1)和category-2 (2)。这是一个反对的草图：我将假设您有一些商店categories根据图片名称返回1或2。

X = numpy.array([])
y = list()

for path, subdirs, files in os.walk(root):
  for name in files:
    img_path = os.path.join(path,name)
    correct_cat = categories[img_path]
    img_pixels = list(Image.open(img_path).getdata())
    X = numpy.vstack((X, img_pixels))
    y.append(correct_cat)

您正在有效地存储图像像素和类别值（转换为整数）。可能有其他方法可以做到这一点：例如Check this。

获得X和y列表后，您可以在其上调用train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y)

将train_test_split与我本地目录中的图像一起使用

1 个答案: