如何构建在计算中引用选择结果的MySQL查询?

时间:2016-04-29 19:54:53

标签: mysql bash hamming-distance

我有一些基于blockhash的图像指纹,以及上传它的网站成员和图像的本地网址:

member  varchar(8)  
fingerprint    char(64)
url     varchar(80) 

我正在尝试对这些哈希值进行汉明距离计算,以确定它们匹配的可能性(参考mysql hamming distance between two phash)。

鉴于我所看到的最简单的方法是使用MySql的bit_count函数对两者进行异或并产生总位数,我知道我必须将64个字符的哈希分解为4个块,然后转换在将其提供给bit_count之前,每个都是无符号整数。所以,我有一个这样的查询(从Linux命令行运行,因此参数变量):

select bit_count(cast(conv(substr('$1', 1, 16), 16, 10) as unsigned) ^ cast(conv(substr($2, 1, 16), 16, 10) as unsigned)) + 
bit_count(cast(conv(substr('$1', 17, 16), 16, 10) as unsigned) ^ cast(conv(substr('$2', 17, 16), 16, 10) as unsigned)) +
bit_count(cast(conv(substr('$1', 33, 16), 16, 10) as unsigned) ^ cast(conv(substr('$2', 33, 16), 16, 10) as unsigned)) +
bit_count(cast(conv(substr('$1', 49, 16), 16, 10) as unsigned) ^ cast(conv(substr('$2', 49, 16), 16, 10) as unsigned));

..这会在两个指纹之间产生适当的结果。

但是,我需要一个查询,可以找到与相关成员以外的任何人匹配的指纹。基本上是:

select member, url 
from images 
where (Hamming Distance between <fingerprint> and (select hashes from member)  < 10) 
    AND member != "<value>"

我想我可能想创建一个存储过程来确定汉明距离,然后可能会将我必须检查的结果从整个数据库中检查到匹配前10个字符的结果。但有更好的方法吗?

2 个答案:

答案 0 :(得分:1)

hamming_distance存储的函数是个好主意。然后你可以在连接中使用它。

SELECT i1.member, i1.url
FROM images AS i1
JOIN images AS i2 ON i1.member != i2.member AND hamming_distance(i1.fingerprint, i2.fingerprint) < 10
WHERE i2.member = @member_in_question

答案 1 :(得分:1)

这个功能很棒。它将指纹分块并返回两者之间的距离。然后这是一个简单的选择:

select member, url, fingerprint, hamming_dist(fingerprint, '$fingerprint') as distance from images where hash REGEXP '$find' && hamming_dist(hash, '$hash') < 8 && member != '$member';"

REGEXP只是将搜索限制为可能的匹配,它由指纹中的第一个和最后一个字符组成。这样做会将查询时间从.35秒降低到.12秒。

感谢您的帮助!