我试图弄清楚如何根据几个不同的列和表找到重复项。
我有这些表:
我想在列productName
,brandid
,origin
的表格产品中找到完全匹配的内容。但是要将行作为重复项进行转换,我还需要进行比较,以便它们具有完全相同的标记(列:tagid
)和组(列:groupid
)。
每个产品可能有多个标签和多个组。
这就是我想出来的......但它并没有完全符合我的需要。
SQLFiddle http://sqlfiddle.com/#!9/43f19/1
在我的SQL小提琴示例中,我列出了10种不同的产品。
例如,产品1,2是完全匹配,因此应列为副本。 产品编号3仅分配了一个组,因此即使任何其他参数适合(不应列出),也与产品1和2不同。我对dupid列的意图是列出一组重复项的第一个条目。
id | name | brandid | origin | tags | groups | dupid
1 | prod | 1 | England | 1,2 | 1,2 | 1
2 | prod | 1 | England | 1,2 | 1,2 | 1
3 | prod | 1 | England | 1,2 | 1 | 3
应该在我的SQL小提琴中列为完全匹配的完整项目集是:
id 1
id 2
id 4
id 5
我猜这个失败的原因是我没有成功地将标签和组正确地纳入我的比较中。
SELECT m.*,dup.id AS dupid,GROUP_CONCAT(DISTINCT t.tagid ORDER BY t.tagid ASC) AS alltags,GROUP_CONCAT(DISTINCT g.groupid ORDER BY g.groupid ASC) AS groups
FROM `products` m
JOIN (SELECT id,`productName`, brandid, origin, COUNT(*) AS c FROM products
GROUP BY `productName`, brandid, origin HAVING c > 1) dup ON m.`productName` = dup.`productName` AND m.brandid = dup.brandid AND m.origin = dup.origin
LEFT JOIN tags AS t ON t.productid = m.id
LEFT JOIN groups AS g ON g.productid = m.id
GROUP BY m.id
ORDER BY `productName`,brandid,origin
有关如何实现这一目标的任何帮助和/或建议都是高度评价的。
答案 0 :(得分:1)
我的猜测是你在ID字段上的子查询上缺少聚合函数,同样 - 你需要按产品名称,来源和品牌分组,而不是id,所以试试这个:
SELECT m.*,dup.id AS dupid,GROUP_CONCAT(DISTINCT t.tagid ORDER BY t.tagid ASC) AS alltags,GROUP_CONCAT(DISTINCT g.groupid ORDER BY g.groupid ASC) AS groups
FROM `products` m
JOIN (SELECT min(id) as id,`productName`, brandid, origin, COUNT(*) AS c FROM products
GROUP BY `productName`, brandid, origin HAVING c > 1) dup ON m.`productName` = dup.`productName` AND m.brandid = dup.brandid AND m.origin = dup.origin
LEFT JOIN tags AS t ON t.productid = m.id
LEFT JOIN groups AS g ON g.productid = m.id
GROUP BY m.`productName`,m.brandid,m.origin
ORDER BY m.`productName`,m.brandid,m.origin
编辑:您可以使用此查询:
SELECT tt.*
FROM(
SELECT m.*,GROUP_CONCAT(DISTINCT t.tagid ORDER BY t.tagid ASC) AS alltags,GROUP_CONCAT(DISTINCT g.groupid ORDER BY g.groupid ASC) AS groups
FROM `products` m
LEFT JOIN tags AS t ON t.productid = m.id
LEFT JOIN groups AS g ON g.productid = m.id
GROUP BY m.id) tt
INNER JOIN
(SELECT productName,brandid,origin,alltags,groups
FROM
(SELECT m.*,GROUP_CONCAT(DISTINCT t.tagid ORDER BY t.tagid ASC) AS alltags,GROUP_CONCAT(DISTINCT g.groupid ORDER BY g.groupid ASC) AS groups
FROM `products` m
LEFT JOIN tags AS t ON t.productid = m.id
LEFT JOIN groups AS g ON g.productid = m.id
GROUP BY m.id) s
GROUP BY productName,brandid,origin,alltags,groups
HAVING COUNT(*) > 1) ss
ON(tt.productName = ss.productName and tt.brandid = ss.brandid and tt.origin = ss.origin
and tt.alltags = ss.alltags and tt.groups = ss.groups)