PCA结果上的K-最近邻居(python)

时间:2018-01-23 12:56:24

标签: python scikit-learn pca knn

我使用python和一些表格图片(从上面拍摄)。我的目的是通过用PCA分析表格图像,根据形状(方形,矩形,圆形)对这些表进行分类,然后将这些结果用作k近邻分类器的输入。

我的源代码如下:

import cv2
import numpy as np
from glob import glob
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn import neighbors


data = []
labels = []
# Read images from file
for filename in glob('Tables/*.jpg'):

    img = cv2.imread(filename)
    height, width = img.shape[:2]
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    if height == 1125 and width == 2000:

        # Create the data list by reshaping an image from matrix to vector
        img = np.reshape(img, (1, 1125 * 2000))
        img = img[0]
        data.append(img)

        # Create the labels list
        if filename[11] == 'S':
            labels.append('Square')
        elif filename[12] == 'o':
            labels.append('Round')
        elif filename[12] == 'e':
            labels.append('Rectangular')


train_img, test_img, train_lbl, test_lbl = train_test_split( data, labels, test_size=0.2, random_state=0)

# Fit on training set only.
scaler = StandardScaler()
scaler.fit(train_img)

# Apply transform to both the training set and the test set
train_img = scaler.transform(train_img)
test_img = scaler.transform(test_img)

# Make an instance of the pca model
pca = PCA(0.95)
pca.fit(train_img)
print(pca.explained_variance_ratio_)

# Transform images with pca model
train_img = pca.transform(train_img)
test_img = pca.transform(test_img)

# Make an instance of knn model
knn = neighbors.KNeighborsClassifier()
knn.fit(train_img, train_lbl)

# Accuracy of knn test
accuracy = knn.score(test_img, test_lbl)
print(accuracy)

print(pca.explained_variance_ratio_)的输出是:

[ 0.22406799  0.14130877  0.07979864  0.05853734  0.05434577  0.0488873
  0.04629602  0.04229425  0.03923615  0.03613698  0.03199812  0.02858658
  0.02564182  0.02347223  0.01883306  0.01648042  0.01575557  0.01376232
  0.01296937]

输出到'打印(准确度)'是:

0.666666666667

如果考虑到我在数据集中仅使用了30个表格图像,或者我遗漏了哪些重要内容,这些结果是否合理?

一般来说,我是否在pca结果上正确应用了knn算法?

0 个答案:

没有答案