Question

我正在尝试使用带有视觉词袋，KMeans聚类和直方图的SIFT计算局部描述符对图像进行分类。

我已经阅读了很多SO答案并试图关注these instructions，但是，感觉我不明白整个管道应该如何工作。下面是我实施的代码，它的工作速度非常慢。

这就是我问这个问题的原因：澄清我对使用SIFT描述符进行分类和验证我的代码实现的理解。

我希望得到有关我理解的反馈，并在提高我对这个概念的了解方面获得一些帮助。

首先，我为SIFT编写了一个类包装器。我的包装器使用滑动窗口计算图像块上的SIFT描述符。它还使用Root SIFT进行描述符计算。函数detectAndCompute是它的主要功能，基本上它将图像作为参数，使用滑动窗口将其裁剪成若干子图像，为每个子图像计算根SIFT描述符，并将所有子图像中的所有描述符合并将图像转换为单个描述符矩阵。

class DenseRootSIFT(object):
    def __init__(self):
        self.sift = cv2.xfeatures2d.SIFT_create()

    def detectAndCompute(self, image, step_size=12, window_size=(10, 10)):
        if window_size is None:
            winH, winW = image.shape[:2]
            window_size = (winW // 4, winH // 4)

        descriptors = np.array([], dtype=np.float32).reshape(0, 128)
        for crop in self._crop_image(image, step_size, window_size):
            crop = cv2.cvtColor(crop, cv2.COLOR_BGR2GRAY)
            descs = self._detectAndCompute(crop)[1]
            if descs is not None:
                descriptors = np.vstack([descriptors, self._detectAndCompute(crop)[1]])
        return descriptors

    def _detect(self, image):
        return self.sift.detect(image)

    def _compute(self, image, kps, eps=1e-7):
        kps, descs = self.sift.compute(image, kps)

        if len(kps) == 0:
            return [], None

        descs /= (descs.sum(axis=1, keepdims=True) + eps)
        descs = np.sqrt(descs)
        return kps, descs

    def _detectAndCompute(self, image):
        kps = self._detect(image)
        return self._compute(image, kps)

    def _sliding_window(self, image, step_size, window_size):
        for y in xrange(0, image.shape[0], step_size):
            for x in xrange(0, image.shape[1], step_size):
                yield (x, y, image[y:y + window_size[1], x:x + window_size[0]])

    def _crop_image(self, image, step_size=12, window_size=(10, 10)):
        crops = []
        winH, winW = window_size
        for (x, y, window) in self._sliding_window(image, step_size=step_size, window_size=(winW, winH)):
            if window.shape[0] != winH or window.shape[1] != winW:
                continue
            crops.append(image[y:y+winH, x:x+winW])
        return np.array(crops)

下面我发布了一个名为DenseRootSiftPreparator的类，它应该提供从图像中提取SIFT特征并准备进一步分类的工具（特别是使用sklearn的LinearSVC）。

所以，我遵循这个过程：

在下面的课程中生成一个码本（_generate_codebook函数）。通过应用具有2048个簇的小批量KMeans群集来生成码本。作为输出，函数返回2048 x 128矩阵。
然后我尝试按照instructions I've posted above为数据集中的每个图像创建直方图。使用_create_histogram函数创建单个图像的直方图。首先，用零初始化直方图。然后为输入图像计算描述符，并且对于每个描述符，我试图在先前生成的码本中找到最接近的描述符的索引（使用KDTree from scipy）并递增该索引上的直方图的值。然后我L2直线标准化直方图数组并返回它。对每个图像重复相同的过程。而且非常慢。

以下是DenseRootSiftPreparator的代码：

class DenseRootSiftPreparator(object):
    def __init__(self, histogram_size=2048):
        self.X = []
        self.dense_root_sift = DenseRootSIFT()
        self.histogram_size = histogram_size

    def fit(self, image_dataset, y=None):
        # @param image_dataset - array of images in OpenCV format
        self.X = image_dataset

    def extract_descriptors_and_prepare_for_classification(self, image):
        return self._get_histograms_for_image(image)

    def _get_histograms_for_image(self, image):
        codebook = self._generate_codebook(image)
        histograms = []

        for img in self.X:
            histogram = self._create_histogram(img, self.histogram_size, codebook)
            histograms.append(histogram)
        return histograms

    def _create_histogram(self, image, hist_size, codebook):
        histogram = np.zeros(hist_size)
        descriptors = self.dense_root_sift.detectAndCompute(image, window_size=None)
        tree = spatial.KDTree(codebook)

        for i in xrange(len(descriptors)):
            histogram[tree.query(descriptors[i])[1]] += 1

        return normalize(histogram[:, np.newaxis], axis=0).ravel()

    def _generate_codebook(self, image):
        descriptors = self.dense_root_sift.detectAndCompute(image, window_size=None)
        kmeans = MiniBatchKMeans(n_clusters=2048, batch_size=128,
                                n_init=10, max_no_improvement=10)
        kmeans.fit(descriptors)
        codebook = kmeans.cluster_centers_[:]
        return codebook

我会用以下方式测试我的代码：

images = get_images_dataset()
test_input_img = cv2.imread('test_input_image.jpg')
histogram_extractor = DenseRootSiftPreparator()
histogram_extractor.fit(images)
hists = histogram_extractor.extract_descriptors_and_prepare_for_classification(test_input_img)

这是我的导入（以防万一）：

import numpy as np
from scipy import spatial
import cv2
from cv2.xfeatures2d import SIFT_create
from sklearn.cluster import MiniBatchKMeans
from sklearn.preprocessing import normalize

我的主要问题：

我是否正在使用创建Bag of Visual Words模型 SIFT描述符是否正确？
如果没有，我做错了什么？什么可以做得更好？
我上面描述的功能是否正常工作，或者我错过了什么？
有没有办法让分类过程的SIFT描述符准备更好，更有效？

准备SIFT描述符以进一步进行SVM分类（OpenCV 3，sklearn）

0 个答案: