sql查询通过标签确定最相似的商品

时间:2013-06-28 17:55:18

标签: mysql sql join

我正在制作电子商店,所以我有3张桌子:

1)goods

id      | title
--------+----------- 
1       | Toy car
2       | Toy pony
3       | Doll

2)tags

id      | title
--------+----------- 
1       | Toy
2       | Boys
3       | Girls

3)links

goods_id| tag_id
--------+----------- 
1       | 1
1       | 2
2       | 1
2       | 2
2       | 3
3       | 3

所以我需要使用这样的算法打印相关商品:使用标签获取与所选商品最相似的商品。大多数标签是相互的 - 最合适的项目是

因此goods#1的结果应为:goods#2goods#3

代表goods#2goods#1goods#3

代表goods#3goods#2goods#1

我不知道如何通过一个查询按相互标签的数量排序类似商品

3 个答案:

答案 0 :(得分:3)

此查询将返回具有最大共享标记数的所有项目:

SET @item = 1;

SELECT
  goods_id
FROM
  links
WHERE
  tag_id IN (SELECT tag_id FROM links WHERE goods_id=@item)
  AND goods_id!=@item
GROUP BY
  goods_id
HAVING
  COUNT(*) = (
    SELECT
      COUNT(*)
    FROM
      links
    WHERE
      tag_id IN (SELECT tag_id FROM links WHERE goods_id=@item)
      AND goods_id!=@item
    GROUP BY
      goods_id
    ORDER BY
      COUNT(*) DESC
    LIMIT 1
  )

请参阅小提琴here

或者这个将返回所有项目,即使那些没有共同标签的项目,按照共同标识中的标签数量排序:

SELECT
  goods_id
FROM
  links
WHERE
  goods_id!=@item
GROUP BY
  goods_id
ORDER BY
  COUNT(CASE WHEN tag_id IN (SELECT tag_id FROM links WHERE goods_id=@item) THEN 1 END) DESC;

答案 1 :(得分:1)

如果要显示货物ID = 2

的货物
SELECT DISTINCT
  goods.*
FROM
  goods
  LEFT JOIN links ON links.goods_id = goods.id
WHERE links.tag_id IN (SELECT links.tag_id 
                       FROM links
                       WHERE links.goods_id = 2)

当你没有包括goods_id = 2

SELECT DISTINCT
  goods.*
FROM
  goods
  LEFT JOIN links ON links.goods_id = goods.id
WHERE links.goods_id != 2 AND links.tag_id IN (SELECT links.tag_id 
                       FROM links
                       WHERE links.goods_id = 2)

可以在http://sqlfiddle.com/#!2/0fb60/38

上看到

答案 2 :(得分:-1)

一些帮助:

假设您看起来与商品#1最相似

SELECT a.*  
FROM (SELECT * FROM goods WHERE id <> 1) a 
LEFT JOIN (SELECT z.goods_id, count(*) as total
          FROM links z
          WHERE z.goods_id <> 1 AND
          z.tag_id in (SELECT DISTINCT tag_id from links where goods_id = 1)
          GROUP BY z.goods_id) b 
ON a.id = b.goods_id
ORDER by b.total DESC

但是,我认为你可以尝试一些不同的东西。您可以按常用标记的比例进行排序,而不是按常用标记的数量排序。有了这个,您将避免这样的事实,即具有更多标签的产品将始终位于排名的顶部,即使相对常见标签不是很多。