我使用python和大约4000张手表图片(示例:watch_1,watch_2)。图像为rgb,分辨率为450x450。我的目标是找到其中最相似的手表。出于这个原因,我使用./TAppEncoderStatic -c ../cfg/MV-HEVC/baseCfg_3view.cfg -q 36 -b ../testseq/balloons_00_1024x768_common_bin_QP29_base.bin -wdt 1024 -hgt 768 -fr 30 | tee out.log
的{{3}}和IncrementalPCA
来使用我的26GB内存来处理这些大数据(另请参阅:partial_fit
,SO_Link_1)。我的源代码如下:
scikit_learn
然而,当我以40张手表图片开始运行此程序时,import cv2
import numpy as np
import os
from glob import glob
from sklearn.decomposition import IncrementalPCA
from sklearn import neighbors
from sklearn import preprocessing
data = []
# Read images from file #
for filename in glob('Watches/*.jpg'):
img = cv2.imread(filename)
height, width = img.shape[:2]
img = np.array(img)
# Check that all my images are of the same resolution
if height == 450 and width == 450:
# Reshape each image so that it is stored in one line
img = np.concatenate(img, axis=0)
img = np.concatenate(img, axis=0)
data.append(img)
# Normalise data #
data = np.array(data)
Norm = preprocessing.Normalizer()
Norm.fit(data)
data = Norm.transform(data)
# IncrementalPCA model #
ipca = IncrementalPCA(n_components=6)
length = len(data)
chunk_size = 4
pca_data = np.zeros(shape=(length, ipca.n_components))
for i in range(0, length // chunk_size):
ipca.partial_fit(data[i*chunk_size : (i+1)*chunk_size])
pca_data[i * chunk_size: (i + 1) * chunk_size] = ipca.transform(data[i*chunk_size : (i+1)*chunk_size])
# K-Nearest neighbours #
knn = neighbors.NearestNeighbors(n_neighbors=4, algorithm='ball_tree', metric='minkowski').fit(data)
distances, indices = knn.kneighbors(data)
print(indices)
时出现以下错误:
i = 1
但是,很明显我在编码ValueError: Number of input features has changed from 4 to 6 between calls to partial_fit! Try setting n_components to a fixed value.
时将n_components
设置为6,但出于某种原因ipca = IncrementalPCA(n_components=6)
将ipca
视为chunk_size = 4
时的组件数量i = 0
}然后当i = 1
更改为6时。
为什么会这样?
我该如何解决?
答案 0 :(得分:2)
这似乎遵循PCA背后的数学原因,因为n_components > n_samples
会对它产生不良影响。
您可能有兴趣阅读this(错误消息的介绍)和some discussion behind it。
尝试增加批量大小/块大小(或降低n_components)。
(总的来说,我对这种方法也持怀疑态度。我希望你使用batch-PCA对一些小的示例数据集进行测试。看起来你的手表在几何方面没有经过预处理:裁剪;也许hist- /彩色正规化。)