我正在研究一个Python代码,该代码使用不同的Tensorflow代码在集合中查找相似的图像,我能够矢量化集合中的所有图像,并使用近似的最近邻居找到每个单个图像的k个最相似的图像算法。我使用的代码将矢量化图像作为输入,并为每个图像输出一个JSON文件,每个JSON文件都包含图像名称和k个最相似图像的相似度(百分比)。
例如JSON文件“ 2017-3400-039808_xx_xx_GdSOs-jpg.json”包含:
[{"filename": "2017-3400-039808_xx_xx_GdSOs-jpg", "similarity": 1.0},
{"filename": "2018-3400-033286_xx_xx_OCxsV-JPG", "similarity": 0.8654},
{"filename": "2018-3400-028154_xx_xx_yePFK-jpg", "similarity": 0.8596},
{"filename": "2018-3400-037564_xx_xx_GhnLB-jpg", "similarity": 0.8561},
{"filename": "2018-3400-036039_xx_xx_QtDfu-jpg", "similarity": 0.8537}]
我遇到的问题是矢量化新图像并将其添加到包含所有矢量化图像的“ image_vectors /”文件夹中并运行相似性代码,希望它仅能在新图像上运行,但是会重新计算相似性再次对集合中的每个图像进行优化,这并不是最佳选择,尤其是当图像集合很大时。
我可以在此代码中进行哪些更改,以使其找到与新输入图像相似的图像?
图像矢量位于名为“ image_vectors”的目录中,并且都具有扩展名“ .npz”
from annoy import AnnoyIndex
from scipy import spatial
from nltk import ngrams
import random, json, glob, os, codecs, random
import numpy as np
# data structures
file_index_to_file_name = {}
file_index_to_file_vector = {}
chart_image_positions = {}
# config
dims = 2048
n_nearest_neighbors = 30
trees = 10000
infiles = glob.glob('image_vectors/*.npz')
# build ann index
t = AnnoyIndex(dims)
for file_index, i in enumerate(infiles):
file_vector = np.loadtxt(i)
file_name = os.path.basename(i).split('.')[0]
file_index_to_file_name[file_index] = file_name
file_index_to_file_vector[file_index] = file_vector
t.add_item(file_index, file_vector)
t.build(trees)
# create a nearest neighbours json file for each input
if not os.path.exists('Similar_Images'):
os.makedirs('Similar_Images')
for i in file_index_to_file_name.keys():
master_file_name = file_index_to_file_name[i]
master_vector = file_index_to_file_vector[i]
named_nearest_neighbors = []
nearest_neighbors = t.get_nns_by_item(i, n_nearest_neighbors)
for j in nearest_neighbors :
neighbor_file_name = file_index_to_file_name[j]
neighbor_file_vector = file_index_to_file_vector[j]
similarity = 1 - spatial.distance.cosine(master_vector, neighbor_file_vector)
rounded_similarity = int((similarity * 10000)) / 10000.0
named_nearest_neighbors.append({
'filename': neighbor_file_name,
'similarity': rounded_similarity
})
with open('Similar_Images/' + master_file_name + '.json', 'w') as out:
json.dump(named_nearest_neighbors, out)
print(master_file_name)