Question

我在sql server 2012中有一个包含以下列的表： user_id，merchant_id

我想为每个商家找到前5个类似的合作伙伴。相似性简单地由重叠消费者的标准化数量来定义;

我无法找到解决此问题的任何方法。

Answer 1

以下查询计算两个商家的普通客户数量：

select t.merchantid as m1, t2.merchantid as m2, count(*) as common_customers
from table t join
     table t2
     on t.customerid = t2.customerid and t.merchantid <> t2.merchantid
group by t.merchantid, t2.merchantid;

以下根据原始数据获得五个：

select *
from (select t.merchantid as m1, t2.merchantid as m2, count(*) as common_customers,
             row_number() over (partition by t.merchantid order by count(*) desc) as seqnum
      from table t join
           table t2
           on t.customerid = t2.customerid and t.merchantid <> t2.merchantid
      group by t.merchantid, t2.merchantid
     ) mm
where seqnum <= 5;

我不知道你的意思是＆＃34;标准化＆＃34;。术语＆＃34;标准化＆＃34;在统计中通常不会改变值的排序（但会导致平方的总和为1），所以这可能会做你想要的。

找到商家与顾客的相似性

1 个答案: