比较文件OpenCV中的多个直方图

时间:2017-05-24 13:08:40

标签: python opencv numpy image-processing histogram

我有一个图像数据集,我在其中创建每个图像的直方图,然后我想将它们存储(写入)到文件中,这样对于我用作输入的每个新图像,我都会比较该图像的直方图与我已经在文件中的那些,并找出它们是否相同。到目前为止的代码是:

import numpy as np
import cv2
import os.path
import glob
import matplotlib.pyplot as plt
import pickle

index = {}

#output dic
out = {
    1: {},
    2: {},
    3: {},
}

for t in [1]:

    #load_files
    files = glob.glob(os.path.join("..", "data", "train", "Type_{}".format(t), "*.jpg"))
    no_files = len(files)

    #iterate and read
    for n, file in enumerate(files):
        try:
            image = cv2.imread(file)
            img = cv2.resize(image, None, fx=0.1, fy=0.1, interpolation=cv2.INTER_AREA)

            # features : histograms
            plt.hist(img.flatten(), 256, [0, 256], color='r')
            plt.xlim([0,256])
            plt.legend('histogram', loc='upper left')
            plt.show()
            # index[file] = hist

            # write histograms into file
            #compare them and find similarity score
            # result_dist = compareHist(index[0], index[1], cv2.cv.CV_COMP_CORREL)

            print(file, t, "-files left", no_files - n)

        except Exception as e:
            print(e)
            print(file)

有人可以指导我完成这件事吗?谢谢!

1 个答案:

答案 0 :(得分:1)

您可以计算所有图像的红色通道直方图,如下所示:

import os
import glob
import numpy as np
from skimage import io

root = 'C:\Users\you\imgs'  # Change this appropriately
folders = ['Type_1', 'Type_2', 'Type_3']
extension = '*.bmp'  # Change if necessary

def compute_red_histograms(root, folders, extension):
    X = []
    y = []
    for n, imtype in enumerate(folders):
        filenames = glob.glob(os.path.join(root, imtype, extension))    
        for fn in filenames:
            img = io.imread(fn)
            red = img[:, :, 0]
            h, _ = np.histogram(red, bins=np.arange(257), normed=True)
            X.append(h)
            y.append(n)
    return np.vstack(X), np.array(y)

X, y = compute_red_histograms(root, folders, extension)

每个图像通过256维特征向量(红色通道直方图的组成部分)表示,因此X是一个2D NumPy数组,其行数与数据集中的图像数为256列。 y是带有数字类标签的1D NumPy数组,即0Type_11Type_2,{{1}为2 }。

接下来,您可以将数据集拆分为火车并进行测试,如下所示:

Type_3

最后,您可以训练SVM分类器:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)

通过这样做,您可以非常轻松地进行预测或评估分类准确度:

from sklearn.svm import SVC

clf = SVC()
clf.fit(X_train, y_train)