我使用python和一些表格图片(从上面拍摄)。我的目的是通过用PCA分析表格图像,根据形状(方形,矩形,圆形)对这些表进行分类,然后将这些结果用作k近邻分类器的输入。
我的源代码如下:
import cv2
import numpy as np
from glob import glob
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn import neighbors
data = []
labels = []
# Read images from file
for filename in glob('Tables/*.jpg'):
img = cv2.imread(filename)
height, width = img.shape[:2]
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
if height == 1125 and width == 2000:
# Create the data list by reshaping an image from matrix to vector
img = np.reshape(img, (1, 1125 * 2000))
img = img[0]
data.append(img)
# Create the labels list
if filename[11] == 'S':
labels.append('Square')
elif filename[12] == 'o':
labels.append('Round')
elif filename[12] == 'e':
labels.append('Rectangular')
train_img, test_img, train_lbl, test_lbl = train_test_split( data, labels, test_size=0.2, random_state=0)
# Fit on training set only.
scaler = StandardScaler()
scaler.fit(train_img)
# Apply transform to both the training set and the test set
train_img = scaler.transform(train_img)
test_img = scaler.transform(test_img)
# Make an instance of the pca model
pca = PCA(0.95)
pca.fit(train_img)
print(pca.explained_variance_ratio_)
# Transform images with pca model
train_img = pca.transform(train_img)
test_img = pca.transform(test_img)
# Make an instance of knn model
knn = neighbors.KNeighborsClassifier()
knn.fit(train_img, train_lbl)
# Accuracy of knn test
accuracy = knn.score(test_img, test_lbl)
print(accuracy)
print(pca.explained_variance_ratio_)
的输出是:
[ 0.22406799 0.14130877 0.07979864 0.05853734 0.05434577 0.0488873
0.04629602 0.04229425 0.03923615 0.03613698 0.03199812 0.02858658
0.02564182 0.02347223 0.01883306 0.01648042 0.01575557 0.01376232
0.01296937]
输出到'打印(准确度)'是:
0.666666666667
如果考虑到我在数据集中仅使用了30个表格图像,或者我遗漏了哪些重要内容,这些结果是否合理?
一般来说,我是否在pca结果上正确应用了knn算法?