我有一个包含大约300,000行产品信息的数据库 我需要检索重复UPC (COUNT(upc)> 1)的行,其中至少一个结果'描述与某个字符串匹配(“Reed”,例如。)
例如,将全部选择以下行(desc,upc pair)
Deer D7394 62226173
Reed R2536 62226173
Deer D7217 62226173
但没有
Deer D0173 62278389
Deer D7289 62278389
Deer D9272 62278389
以下是我正在使用的查询:
SELECT a.desc, a.upc, a.sku, a.short_description
FROM inventory a
JOIN
(SELECT upc, desc
FROM inventory
GROUP BY upc
HAVING COUNT(upc) > 1) b
ON a.upc = b.upc
WHERE ((a.desc LIKE '%Reed%') OR (b.desc LIKE '%Reed%'))
AND a.upc != ''
AND a.upc != 0
ORDER BY upc;
我对MySQL比较陌生,但这似乎应该可行。但是,某些结果无法返回不匹配的行(即将返回Reed R2536,但不会返回Deer D7394)。
非常感谢任何见解!
答案 0 :(得分:3)
group_concat
方法将起作用,但是当它没有时,它将无声地失败。你永远不会知道;你只会丢失应该存在的行。
您要做的是选择至少有一个描述匹配的所有UPC(以及存在重复项的UPC),然后从该列表中选择与每个UPC匹配的所有行。
如果您按UPC对所有项目进行分组,那么您可以使用计数对每个项目进行注释,并标记是否有任何描述匹配:
SELECT upc, COUNT(*) c, MAX(`desc` LIKE '%Reed%') desc_matches
FROM inventory
GROUP BY upc
(这利用了这样一个事实,即布局运算符,如LIKE
,实际上返回0
表示false,1
表示true表示。取该列的最大值可以告诉您是否有任何行匹配的)
然后您可以根据您的条件过滤该列表,以获得您感兴趣的UPC:
SELECT upc, COUNT(*) c, MAX(`desc` LIKE '%Reed%') desc_matches
FROM inventory
GROUP BY upc
HAVING desc_matches = 1 AND c > 1
获得该列表后,您希望查看与这些UPC中的任何一个匹配的所有产品。你可以通过一个简单的(不是OUTER)连接来做到这一点:
SELECT a.desc, a.upc, a.sku, a.short_description
FROM inventory a
JOIN
( SELECT upc, COUNT(*) c, MAX(`desc` LIKE '%Reed%') desc_matches
FROM inventory
GROUP BY upc
HAVING desc_matches = 1 AND c > 1
) b USING (upc)
答案 1 :(得分:1)
另一种可能的方法,假设你没有太多重复记录,那就是:
select * from inventory i
join (
SELECT upc
FROM inventory
GROUP BY upc
HAVING COUNT(upc) > 1
and group_concat(`desc`) like '%reed%') as available_upc
on available_upc.upc = i.upc
这假设你的表看起来像:
CREATE TABLE inventory(
sku CHAR(32) NOT NULL,
`desc` CHAR(32) NOT NULL,
upc CHAR(32) NOT NULL,
short_description CHAR(32) NOT NULL,
PRIMARY KEY (sku)
);
insert into inventory values ('D7394','Deer','62226173','Small Deer');
insert into inventory values ('R2536','Reed','62226173','Small Reed');
insert into inventory values ('D7217','Deer','62226173','Large Deer');
insert into inventory values ('D0173','Deer','62278389','Small Deer');
insert into inventory values ('D7289','Deer','62278389','Small Reed');
insert into inventory values ('D9272','Deer','62278389','Large Deer');
答案 2 :(得分:0)
很难说没有经过测试,但请尝试:
SELECT a.desc, a.upc, a.sku, a.short_description
FROM inventory a
OUTER RIGHT JOIN
(SELECT upc
FROM inventory
GROUP BY upc
HAVING COUNT(upc) > 1) b
ON a.upc = b.upc
WHERE ((a.desc LIKE '%Reed%') OR (b.desc LIKE '%Reed%'))
AND a.upc != ''
AND a.upc != 0
ORDER BY upc;
关键是OUTER RIGHT JOIN
。请参阅文章:http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins
此外,您只需要从内部SELECT
查询返回upc。