在MySQL v5.6.41 db中获取代表产品的假想模式:
------------------------------------------------
| id | name | vendor_id | vendor_sku | upc | ean |
|----|------|-----------|------------|-----|-----|
| 1 | AAAA | 2 | 5678 | 456 | 111 | [1]
| 2 | aaaa | 2 | 7878 | 789 | 222 | [1]
| 3 | bbbb | 2 | 1234 | 111 | 333 | [2]
| 4 | cccc | 2 | 1234 | 222 | 444 | [2]
| 5 | dddd | 2 | 1111 | 123 | 555 | [3]
| 6 | eeee | 2 | 2222 | 123 | 666 | [3]
| 7 | ffff | 2 | 3333 | 333 | 777 | [4]
| 8 | gggg | 2 | 4444 | 444 | 777 | [4]
| 9 | hhhh | 2 | 5555 | 555 | 888 |
| 10 | iiii | 2 | 6666 | 666 | 999 |
| 11 | jjjj | 2 | 7777 | 777 | 000 |
| 12 | kkkk | 2 | 8888 | 888 | 001 |
| 13 | llll | 2 | 9999 | 999 | 002 |
| 14 | mmmm | 2 | 0000 | 000 | 003 |
------------------------------------------------
我正在尝试查找符合以下条件之一的重复行数:
vendor_id
和相同的vendor_sku
OR vendor_id
和相同的name
(不区分大小写) OR vendor_id
和相同的upc
OR vendor_id
和相同的ean
(每行旁边的[n]
表示对应于这些行重复的条件)
到目前为止,我已经收集了该查询,但这仅符合条件#1:
SELECT
count(*)
FROM
my_table
GROUP BY
vendor_id, vendor_sku
HAVING
COUNT(*) > 1
根据该示例,我的预期结果将是8
答案 0 :(得分:2)
我认为StopIteration
可能有用:
exists
请注意,区分大小写取决于您的排序规则。我没有为案例添加显式处理(我只会使用select count(*)
from my_table t
where exists (select 1
from my_table t2
where t2.vendor_id = t.vendor_id and
t2.id <> t.id and
(t2.vendor_sku = t.vendor_sku or
t2.name = t.name or
t2.upc = t.upc or
t2.ean = t.ean
)
);
),因为尚不清楚是否需要这种处理。
答案 1 :(得分:0)
我仍然认为,不使用依赖子查询,还有其他可能的选择。 当我能够摆脱依赖子查询时,执行计划通常会变得更好。
所以:
SELECT
COUNT(DISTINCT t1.id)
FROM
my_table AS t1
INNER JOIN my_table AS t2 ON (
t1.vendor_id = t2.vendor_id
AND t1.id != t2.id
AND (
t1.vendor_sku = t2.vendor_sku
OR t1.name = t2.name
OR t1.upc = t2.upc
OR t1.ean = t2.ean
)
)
OR:
SELECT
COUNT(DISTINCT t1.id)
FROM
my_table AS t1
LEFT JOIN my_table AS t2 ON (
t1.vendor_id = t2.vendor_id
AND t1.id != t2.id
AND (
t1.vendor_sku = t2.vendor_sku
OR t1.name = t2.name
OR t1.upc = t2.upc
OR t1.ean = t2.ean
)
)
WHERE
t2.id IS NOT NULL
P.S。当我指出错误时,我没有时间修复我以前的答案,所以我使用了del标签而不是删除答案(对此很抱歉)。后来我想修复它,但答案已被主持人删除。