Question

我正在尝试在opencv中实现Bag of Words，并且已经实现了下面的实现。我正在使用Caltech 101 database。然而，由于这是我第一次并且不熟悉，我计划使用数据库中的两个图像集，椅子图像集和足球图像集。我使用this编码了svm。

一切都没事，除非我打电话给classifier.predict(descriptor)，我没有按照预期获得标签。 无论我的测试图像是什么，我总是得到0而不是'1'。主席数据集中的图像数量为10，足球数据集中的图像数量为10。我将椅子标记为0，将足球标记为1。链接代表每个类别的样本，前10个是椅子，底部10是足球

function hello

    clear all; close all; clc;

    detector = cv.FeatureDetector('SURF');
    extractor = cv.DescriptorExtractor('SURF');


    links = {
    'http://i.imgur.com/48nMezh.jpg'
    'http://i.imgur.com/RrZ1i52.jpg'
    'http://i.imgur.com/ZI0N3vr.jpg'
    'http://i.imgur.com/b6lY0bJ.jpg'
    'http://i.imgur.com/Vs4TYPm.jpg'
    'http://i.imgur.com/GtcwRWY.jpg'
    'http://i.imgur.com/BGW1rqS.jpg'
    'http://i.imgur.com/jI9UFn8.jpg'
    'http://i.imgur.com/W1afQ2O.jpg'
    'http://i.imgur.com/PyX3adM.jpg'


    'http://i.imgur.com/U2g4kW5.jpg'
    'http://i.imgur.com/M8ZMBJ4.jpg'
    'http://i.imgur.com/CinqIWI.jpg'
    'http://i.imgur.com/QtgsblB.jpg'
    'http://i.imgur.com/SZX13Im.jpg'
    'http://i.imgur.com/7zVErXU.jpg'
    'http://i.imgur.com/uUMGw9i.jpg'
    'http://i.imgur.com/qYSkqEg.jpg'
    'http://i.imgur.com/sAj3pib.jpg'
    'http://i.imgur.com/DMPsKfo.jpg'
    };


    N = numel(links);

    trainer = cv.BOWKMeansTrainer(100);


    train = struct('val',repmat({' '},N,1),'img',cell(N,1), 'pts',cell(N,1), 'feat',cell(N,1));


    for i=1:N

      train(i).val = links{i};
      train(i).img = imread(links{i});

       if ndims(train(i).img > 2)
         train(i).img = rgb2gray(train(i).img);
       end;

       train(i).pts = detector.detect(train(i).img);
       train(i).feat = extractor.compute(train(i).img,train(i).pts);

     end;

     for i=1:N
          trainer.add(train(i).feat);
     end;

     dictionary = trainer.cluster();
     extractor = cv.BOWImgDescriptorExtractor('SURF','BruteForce');
     extractor.setVocabulary(dictionary);

     for i=1:N
          desc(i,:) = extractor.compute(train(i).img,train(i).pts);
     end;

     a = zeros(1,10)';
     b = ones(1,10)';
     labels = [a;b];


     classifier  = cv.SVM;
     classifier.train(desc,labels);

     test_im =rgb2gray(imread('D:\ball1.jpg'));

     test_pts = detector.detect(test_im);
     test_feat = extractor.compute(test_im,test_pts);

     val = classifier.predict(test_feat);
     disp('Value is: ')
     disp(val)

     end

这些是我的测试样本：

Soccer Ball

Soccer Ball http://www.timeslive.co.za/incoming/2011/08/26/football-soccer-ball-xgold/RESIZED/Small/football+soccer+ball+xgold

Chair

Chair

通过这个网站搜索我认为我的算法是好的，即使我对它不太有信心。如果有人可以帮我找到这个bug，那就太明显了。

按照Amro的代码，这是我的结果：

Distribution of classes:
  Value    Count   Percent
      1       62     49.21%
      2       64     50.79%
Number of training instances = 61
Number of testing instances = 65
Number of keypoints detected = 38845
Codebook size = 100
SVM model parameters:
         svm_type: 'C_SVC'
      kernel_type: 'RBF'
           degree: 0
            gamma: 0.5063
            coef0: 0
                C: 62.5000
               nu: 0
                p: 0
    class_weights: 0
        term_crit: [1x1 struct]

Confusion matrix:

ans =

    29     1
     1    34

Accuracy = 96.92 %

Answer 1

你的逻辑对我来说很好。

现在我想如果你想提高分类准确度，你必须调整各种参数。这包括clustering algorithm参数（如词汇量大小，群集初始化，终止条件等），SVM参数（内核类型，C系数......），局部特征算法使用（SIFT，SURF，..）。

理想情况下，只要您想执行参数选择，就应该使用cross-validation。有些方法已经嵌入了这种机制（例如CvSVM::train_auto），但大多数情况下你必须手动执行此操作......

最后，您应该遵循一般机器学习指南;查看整个bias-variance tradeoff dilemma。在线Coursera ML class在第6周详细讨论了该主题，并解释了如何执行错误分析并使用学习曲线来决定接下来要尝试的内容（我们是否需要添加更多实例，增加模型复杂性等等）。）。

话虽如此，我编写了自己的代码版本。您可能希望将其与您的代码进行比较：

% dataset of images
% I previously saved them as: chair1.jpg, ..., ball1.jpg, ball2.jpg, ...
d = [
    dir(fullfile('images','chair*.jpg')) ;
    dir(fullfile('images','ball*.jpg'))
];

% local-features algorithm used
detector = cv.FeatureDetector('SURF');
extractor = cv.DescriptorExtractor('SURF');

% extract local features from images
t = struct();
for i=1:numel(d)
    % load image as grayscale
    img = imread(fullfile('images', d(i).name));
    if ~ismatrix(img), img = rgb2gray(img); end

    % extract local features
    pts = detector.detect(img);
    feat = extractor.compute(img, pts);

    % store along with class label
    t(i).img = img;
    t(i).class = find(strncmp(d(i).name,{'chair','ball'},4));
    t(i).pts = pts;
    t(i).feat = feat;
end

% split into training/testing sets
% (a better way would be to use cvpartition from Statistics toolbox)
disp('Distribution of classes:')
tabulate([t.class])
tTrain = t([1:7 11:17]);
tTest = t([8:10 18:20]);
fprintf('Number of training instances = %d\n', numel(tTrain));
fprintf('Number of testing instances = %d\n', numel(tTest));

% build visual vocabulary (by clustering training descriptors)
K = 100;
bowTrainer = cv.BOWKMeansTrainer(K, 'Attempts',5, 'Initialization','PP');
clust = bowTrainer.cluster(vertcat(tTrain.feat));

fprintf('Number of keypoints detected = %d\n', numel([tTrain.pts]));
fprintf('Codebook size = %d\n', K);

% compute histograms of visual words for each training image
bowExtractor = cv.BOWImgDescriptorExtractor('SURF', 'BruteForce');
bowExtractor.setVocabulary(clust);
M = zeros(numel(tTrain), K);
for i=1:numel(tTrain)
    M(i,:) = bowExtractor.compute(tTrain(i).img, tTrain(i).pts);
end
labels = vertcat(tTrain.class);

% train an SVM model (perform paramter selection using cross-validation)
svm = cv.SVM();
svm.train_auto(M, labels, 'SvmType','C_SVC', 'KernelType','RBF');
disp('SVM model parameters:'); disp(svm.Params)

% evaluate classifier using testing images
actual = vertcat(tTest.class);
pred = zeros(size(actual));
for i=1:numel(tTest)
    descs = bowExtractor.compute(tTest(i).img, tTest(i).pts);
    pred(i) = svm.predict(descs);
end

% report performance
disp('Confusion matrix:')
confusionmat(actual, pred)
fprintf('Accuracy = %.2f %%\n', 100*nnz(pred==actual)./numel(pred));

以下是输出：

Distribution of classes:
  Value    Count   Percent
      1       10     50.00%
      2       10     50.00%
Number of training instances = 14
Number of testing instances = 6

Number of keypoints detected = 6300
Codebook size = 100

SVM model parameters:
         svm_type: 'C_SVC'
      kernel_type: 'RBF'
           degree: 0
            gamma: 0.5063
            coef0: 0
                C: 312.5000
               nu: 0
                p: 0
    class_weights: []
        term_crit: [1x1 struct]

Confusion matrix:
ans =
     3     0
     1     2
Accuracy = 83.33 %

因此分类器正确地标记了来自测试集的6个图像中的5个，这对于开始来说并不坏:)显然，由于聚类步骤的固有随机性，每次运行代码时都会得到不同的结果。

Answer 2

用于构建字典的图像数量是多少，即N是多少？从您的代码中，您似乎只使用了10个图像（链接中列出的图像）。我希望这个列表被截断，因为这个帖子太少了。通常，您需要大约1000个或更多图像来构建字典，并且图像不必仅限于您正在分类的这两个类。否则，只有10个图像和100个群集，您的词典可能会搞砸。

此外，您可能希望使用SIFT作为首选，因为它往往比其他描述符表现更好。

最后，您还可以通过检查检测到的关键点进行调试。您可以让OpenCV绘制关键点。有时您的关键点检测器参数设置不正确，导致检测到的关键点太少，从而导致特征向量较差。

要了解有关BOW算法的更多信息，您可以查看这些帖子here和here。第二篇文章链接到一本关于使用python的O'Reilley计算机视觉书籍的免费pdf。 BOW模型（以及其他有用的东西）在该书中有更详细的描述。

希望这有帮助。

一袋字没有正确标记回应

按照Amro的代码，这是我的结果：

2 个答案: