关系数据库是否非常适合矢量计算?

时间:2009-08-26 16:19:57

标签: sql optimization math vector

基本表架构看起来像这样(我正在使用MySQL BTW):

integer unsigned vector-id
integer unsigned fk-attribute-id
float attribute-value
primary key (vector-id,fk-attribute-id)

vector 表示为表中的多个记录,其中 vector-id

我需要使用此表中存在的所有向量的点积(也就是欧几里德距离)构建一个单独的表。所以,我需要一个如下所示的结果表:

integer unsigned fk-vector-id-a
integer unsigned fk-vector-id-b
float dot-product


......和这样的人......

integer unsigned fk-vector-id-a
integer unsigned fk-vector-id-b
float euclidean-distance

产生结果的最佳查询结构是什么?

对于非常大的向量,关系数据库是解决此问题的最佳方法,还是应该在应用程序中内化向量并在那里进行计算?

1 个答案:

答案 0 :(得分:4)

INSERT
INTO    dot_products
SELECT  v1.vector_id, v2.vector_id, SUM(v1.attribute_value * v2.attribute_value)
FROM    attributes v1
JOIN    attributes v2
ON      v2.attribute_id = v1.attribute_id
GROUP BY
        v1.vector_id, v2.vector_id

MySQL中,这可以更快:

INSERT
INTO    dot_products
SELECT  v1.vector_id, v2.vector_id,
        (
        SELECT  SUM(va1.attribute_value * va2.attribute_value)
        FROM    attributes va1
        JOIN    attributes va2
        ON      va2.attribute_id = va1.attribute_id
        WHERE   va1.vector_id = v1.vector_id
                AND va2.vector_id = v2.vector_id
        )
FROM    vector v1
CROSS JOIN
        vector v2