Question

在开发光学字符识别引擎时，我似乎偶然发现了一个小问题。我已经在MNIST图像上训练了K最近邻分类器，甚至对其进行了测试。它似乎工作正常。但是，当我输入不同尺寸的图像时，似乎无法正确分类输入图像。关于如何解决这个问题的任何建议？

I] KNN分类器 -

knn分类的代码是：

 % herein, I resize the binary image 'b' to contain the 
 % same dimensions as the training set 'trainingImages' as the input and training Images    
 % should have the same no. of columns / dimensions

b = imresize(b, size(trainingImages));

 % now i try to classify the input image 'b' against the set of training images and   
 % training labels.

cls = knnclassify(b, trainingImages, trainingLabels, 3, 'euclidean');

cls现在是分类向量。但是，无论输入图像如何，这几乎总是显示1的错误分类。

另一方面，当我对MNIST测试图像进行分类时，我获得了非常高的准确度！相同的代码如下 -

class = knnclassify(testImg, trainingImages, trainingLabels, 3, 'euclidean');

现在主要的问题是，无论我给它预测什么样的输入图像，它主要给我一个错误的结果（对于不同的图像而言不同），即使是那些非常不同的图像。好像它运行不正常。有人可以帮我查看问题出在哪里吗？我无法从互联网上的现有资料中找到任何解释。提前谢谢。

Answer 1

我相信我解决了上面列出的问题。问题是：

像Dhanushka所说，我正在转换原始输入图像的尺寸以匹配训练图像集的尺寸（在MNIST的情况下为60000 * 784，意味着60000位数字）和每个数字[28 * 28]的784个特征。因此，我只是将输入图像的尺寸更改为28 * 28.
预处理输入图像。我只是将图像转换为二进制图像并尝试将其与MNIST训练图像数据集进行分类。这是一个 INCOMPLETE 过程。当我进一步检测到输入二进制图像的边缘（Canny，Prewitt或Zerocross - 哪个更适合你）并将其用于分类时，我得到了一个非常准确的预测！

注意：在KNN分类中，您必须通过反复试验来确定相邻像素的数量。我设法得出以下结论 -

3个相邻像素通常足以用于合成图像
1个相邻像素主要适用于手写图像

相同的代码如下：

    % herein, I resize the binary image 'b' as the input and training Images    
    % should have the same no. of columns / dimensions

    b = imresize(b, [28 28]);    % this resizes the binary image b to 28*28 dimension
    b = edge(b, 'canny');        % this uses Canny edge detection on the resized binary 
                                 % image
    b = b(:)';                   % This converts 'b' to a vector using b(:) and then
                                 % transposes the result using the " ' " operator
                                 % Thus, now 'b' has same no of dimensions/columns as 
                                 % MNIST training image set

    % now i try to classify the input image 'b' against the set of training images  
    % and training labels.

    cls = knnclassify(b, trainingImages, trainingLabels, 3, 'euclidean');

通过比较不同尺寸的图像来检测字符

1 个答案: