我有一个图像数据集,我在其中创建每个图像的直方图,然后我想将它们存储(写入)到文件中,这样对于我用作输入的每个新图像,我都会比较该图像的直方图与我已经在文件中的那些,并找出它们是否相同。到目前为止的代码是:
import numpy as np
import cv2
import os.path
import glob
import matplotlib.pyplot as plt
import pickle
index = {}
#output dic
out = {
1: {},
2: {},
3: {},
}
for t in [1]:
#load_files
files = glob.glob(os.path.join("..", "data", "train", "Type_{}".format(t), "*.jpg"))
no_files = len(files)
#iterate and read
for n, file in enumerate(files):
try:
image = cv2.imread(file)
img = cv2.resize(image, None, fx=0.1, fy=0.1, interpolation=cv2.INTER_AREA)
# features : histograms
plt.hist(img.flatten(), 256, [0, 256], color='r')
plt.xlim([0,256])
plt.legend('histogram', loc='upper left')
plt.show()
# index[file] = hist
# write histograms into file
#compare them and find similarity score
# result_dist = compareHist(index[0], index[1], cv2.cv.CV_COMP_CORREL)
print(file, t, "-files left", no_files - n)
except Exception as e:
print(e)
print(file)
有人可以指导我完成这件事吗?谢谢!
答案 0 :(得分:1)
您可以计算所有图像的红色通道直方图,如下所示:
import os
import glob
import numpy as np
from skimage import io
root = 'C:\Users\you\imgs' # Change this appropriately
folders = ['Type_1', 'Type_2', 'Type_3']
extension = '*.bmp' # Change if necessary
def compute_red_histograms(root, folders, extension):
X = []
y = []
for n, imtype in enumerate(folders):
filenames = glob.glob(os.path.join(root, imtype, extension))
for fn in filenames:
img = io.imread(fn)
red = img[:, :, 0]
h, _ = np.histogram(red, bins=np.arange(257), normed=True)
X.append(h)
y.append(n)
return np.vstack(X), np.array(y)
X, y = compute_red_histograms(root, folders, extension)
每个图像通过256维特征向量(红色通道直方图的组成部分)表示,因此X
是一个2D NumPy数组,其行数与数据集中的图像数为256列。 y
是带有数字类标签的1D NumPy数组,即0
为Type_1
,1
为Type_2
,{{1}为2
}。
接下来,您可以将数据集拆分为火车并进行测试,如下所示:
Type_3
最后,您可以训练SVM分类器:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
通过这样做,您可以非常轻松地进行预测或评估分类准确度:
from sklearn.svm import SVC
clf = SVC()
clf.fit(X_train, y_train)