Question

似乎有类似的问题，但并不完全。我尝试沿着这条路走下去（compare data sets and return best match），但发现自己很难过。

我需要设置并找到最佳匹配集。所以说我们有search_obj包含值（1,4,29,44,378,379）。我想找到具有相似值的其他对象，理想情况下找到最符合此值的对象。将会有大量其他对象，因此性能是一个很大的问题。

我目前正在使用php和mysql，但如果这意味着更好的性能，我愿意改变它。

感谢您的帮助。

Answer 1

您可以使用array_intersect计算两个数组的交集，它返回第二个数组中存在的第一个数组的值。如果您要与多个列表进行比较，则可以使用返回数组的长度（即，交叉点越近，长度越大，因此匹配越近）。

Answer 2

我想到了：

假设您有一个唯一对的表（a，b）：

CREATE table t1 (a INT, b INT, PRIMARY KEY (a, b));

现在填写：

INSERT INTO t1
VALUES (1,1), (1,2),               -- item to compare with
       (2,1), (2,3),               -- has one common prop with 1
       (3,1), (3,2),               -- has the same props as 1
       (4,1), (4,2), (4,3), (4,4); -- has 2 same props with 1

以下查询将根据相似性对其他项目进行排序：

SELECT t1.a,
    COUNT(t2.a) as same_props_count,
    ABS(COUNT(t2.a) - COUNT(*)) as diff_count
FROM t1
LEFT JOIN t1 as t2 ON t1.b = t2.b and t2.a = 1
WHERE t1.a <> 1
GROUP BY t1.a
ORDER BY same_props_count DESC, diff_count;


a, same_props_count, diff_count
3, 2,                0
4, 2,                2
2, 1,                1

比较属性集以找到最佳匹配

2 个答案: