假设您有一些看起来像
的记录+--------+--------------+-------+
| person | attribute_id | value |
+--------+--------------+-------+
| 1 | 1 | 4 |
| 1 | 2 | 2 |
| 1 | 3 | 0 |
| 2 | 1 | 0 |
| 2 | 2 | 5 |
| 2 | 3 | 5 |
| 3 | 1 | 3 |
| 3 | 2 | 4 |
| 3 | 3 | 1 |
+--------+--------------+-------+
如果我们以person = 1为中心,那么根据0到5的这些属性得分找到最近匹配人的最有效方法是什么?
理想情况下,我想在SQL(MySQL)而不是应用层中执行此操作。
如果您有架构
,我可以看到这将更容易建模+-------+--------------+-------------+-------------+
|person | attribute_1 | attribute_2 | attribute_3 |
|1 | 4 | 2 | 0 |
|2 | 0 | 5 | 5 |
|3 | 3 | 4 | 1 |
+-------+--------------+-------------+-------------+
你可以做点什么
SELECT ABS($search_attr_1-attribute_1) AS diff_1, ABS($search_attr_2-attribute_2) AS diff_2, ABS($search_attr_3-attribute_3) AS diff_3
FROM scores
ORDER BY diff_1 ASC, diff_2 ASC, diff_3 ASC
答案 0 :(得分:0)
要直接转置此表,您可以使用以下查询:
create table data_transpose as
select person
, case when attribute_id = 1 then value else null end as attribute_1
, case when attribute_id = 2 then value else null end as attribute_2
, case when attribute_id = 3 then value else null end as attribute_3
, case when attribute_id = 4 then value else null end as attribute_4
;
就最近的匹配而言,您可以选择很多距离/相似度量。您可能希望考虑欧几里德和余弦相似性度量等。为了执行余弦相似性(我认为它具有更好的属性),请使用以下内容,假设您的数据如下:
+-------+--------------+-------------+-------------+-------+---------------+---------------+---------------+
|person | attribute_1 | attribute_2 | attribute_3 |person2| attribute_1_2 | attribute_2_2 | attribute_3_2 |
|1 | 4 | 2 | 0 |2 | 0 | 5 | 5 |
+-------+--------------+-------------+-------------+-------+---------------+---------------+---------------+
您可以在交叉连接或创建上表的其他连接后计算余弦相似度,如下所示:
select person
, person2
, (attribute_1 * attribute_1_2 + attribute_2 * attribute_2_2 + attribute_3 * attribute_3_2 + attribute_4 * attribute_4_2)/
(
sqrt(pow(abs(attribute_1),2) + pow(abs(attribute_2),2) + pow(abs(attribute_3),2) + pow(abs(attribute_3),2) + pow(abs(attribute_4),2))
*
sqrt(pow(abs(attribute_1-2),2) + pow(abs(attribute_2_2),2) + pow(abs(attribute_3_2),2) + pow(abs(attribute_3_2),2) + pow(abs(attribute_4_2),2))
) as cosine_similarity
from
some_join_of_transposed_table_here
;
此处的代码不是调试代码。祝你好运。