Question

我有一个数据集“Digit”。该数据集包括1797个小图像（8×8像素），每个图像包括手写数字（0-9）。每个图像被视为具有像素作为特征的数据样本。因此，要构建要素表，您必须将每个8x8图像转换为特征矩阵的一行，其中包含64个像素的64个特征列。如何为它构建一个特征矩阵和标签向量？

Answer 1

您可以关注有关监督学习的scikit-learn教程，他们使用Digit数据集

http://scikit-learn.org/stable/tutorial/basic/tutorial.html#loading-an-example-dataset

更详细here。如果您在示例中加载数据集，则可以简单地重塑图像：

from sklearn import datasets
digits = datasets.load_digits()
# To apply a classifier on this data, we need to flatten the image, to
# turn the data in a (samples, feature) matrix:
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))

这使data成为一个二维矩阵，其中包含n_samples行和所需数量的列，以适应展平的图像。

Answer 2

如果您使用numpy和cv2，则可以执行以下操作：

import numpy as np
import cv2

fname = "image1.jpg"
image = cv2.imread(fname)                    #  (8, 8, 1)

feature = image.reshape(64)                #  (64,)

要读取一堆图像并加载到“要素矩阵”（numpy数组）中，您可以执行以下操作：

N = 10 # number of images
data = np.zeros((N, 64))

for index in range(N):

    # get the current image and convert to feature, as above

    data[index] = np.copy(feature)

数据矩阵的每一行现在都是一个例子（64个暗淡的特征列表）。

这有帮助吗？

标签向量可以只是一维numpy数组，即labels = np.zeros(N)

编辑：

有多种方法可以阅读图像：

（1）img = cv2.imread(filename)

（2）使用matplotlib：

import matplotlib.image as mpimg
img = mpimg.imread(filename)

（3）使用PIL（或PILLOW）：

from PIL import Image
img = Image.open(filename)

在读取图像后检查图像的形状是值得的，这样您就知道它的正确通道，宽度，高度顺序适合您的应用。

构建特征矩阵和标签向量：

2 个答案: