我有下表:
Orders
order_id
9
10
11
Order_details
order_id, product_id
9, 7
10, 5
10, 6
11, 6
11, 7
Products
product_id, product_name, price
5, potato, 4.99
6, potato *, 7.5
7, orange, 7.99
我已经收到了有关如何查找商品名称重复的订单的反馈,但是现在情况变得有些复杂,因为事实证明,重复的位置在商品名称后带有附加符号“ *”,如上所示。
如何添加到此查询可能性中,以仅计算其中一个产品没有其他字符而其他产品带有其他字符的订单?
例如,将忽略“马铃薯”和“马铃薯”,也将忽略“马铃薯*”和“马铃薯*”,但结果中将包含“马铃薯”和“马铃薯*”的顺序
select od.order_id
from order_details od join
products p
on od.product_id = p.product_id
group by od.order_id
having count(p.product_name) > count(distinct p.product_name)
答案 0 :(得分:1)
一个选择可能只是简单地替换以从产品名称中删除*
:
SELECT
od.order_id
FROM order_details od
INNER JOIN products p
ON od.product_id = p.product_id
GROUP BY
od.order_id
HAVING
COUNT(DISTINCT p.product_name) <>
COUNT(DISTINCT REPLACE(p.product_name, ' *', ''));
该演示是针对MySQL的,但同一查询应至少在其他几个数据库上运行。
理想情况下,最好在产品名称上进行正则表达式替换,这样可以避免后跟*
的空格出现在产品名称的合法部分。
编辑:
由于您使用的是Postgres,因此我们实际上可以进行更具针对性的正则表达式替换:
SELECT
od.order_id
FROM order_details od
INNER JOIN products p
ON od.product_id = p.product_id
GROUP BY
od.order_id
HAVING
COUNT(DISTINCT p.product_name) <>
COUNT(DISTINCT REGEXP_REPLACE(p.product_name, ' \*$', ''));
答案 1 :(得分:0)
您可以在最长的初始子字符串上链:
CREATE TABLE products (
product_id INTEGER NOT NULL PRIMARY KEY
, product_name text
, price DECIMAL(8,2)
);
INSERT INTO products(product_id, product_name, price) VALUES
(5, 'potato', 4.99)
,(6, 'potato *', 7.5)
,(1, 'potatoes', 7.48) -- added these
,(2, 'potatoe', 7.49) --
,(7, 'orange', 7.99)
;
ALTER TABLE products
ADD COLUMN parent_id INTEGER REFERENCES products(product_id)
, ADD COLUMN canonical_id INTEGER REFERENCES products(product_id);
UPDATE products
SET canonical_id = product_id;
SELECT*FROM products;
WITH xxx AS ( select product_id, product_name
, length(product_name) AS len
FROM products
)
UPDATE products dst
SET parent_id = src.product_id
FROM xxx src
-- WHERE position (src.product_name IN dst.product_name) = 1
WHERE dst.product_name LIKE src.product_name ||'%'::text
AND src.len > 4
AND src.len < length(dst.product_name)
AND NOT EXISTS (
SELECT * FROM xxx nx
WHERE dst.product_name LIKE nx.product_name|| '%'::text
AND nx.len < length(dst.product_name)
AND nx.len > src.len
AND nx.product_id <> dst.product_id
)
;
SELECT*FROM products;
WITH yyy AS ( select product_id, product_name
, length(product_name) AS len
FROM products
)
UPDATE products dst
SET canonical_id = src.product_id
FROM yyy src
WHERE dst.product_name LIKE src.product_name ||'%'::text
AND src.len > 4
AND src.len < length(dst.product_name)
AND NOT EXISTS (
SELECT * FROM yyy nx
WHERE dst.product_name LIKE nx.product_name|| '%'::text
AND nx.len < src.len
)
;
SELECT*FROM products;
结果:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 5
ALTER TABLE
UPDATE 5
product_id | product_name | price | parent_id | canonical_id
------------+--------------+-------+-----------+--------------
5 | potato | 4.99 | | 5
6 | potato * | 7.50 | | 6
1 | potatoes | 7.48 | | 1
2 | potatoe | 7.49 | | 2
7 | orange | 7.99 | | 7
(5 rows)
UPDATE 3
product_id | product_name | price | parent_id | canonical_id
------------+--------------+-------+-----------+--------------
5 | potato | 4.99 | | 5
7 | orange | 7.99 | | 7
6 | potato * | 7.50 | 5 | 6
2 | potatoe | 7.49 | 5 | 2
1 | potatoes | 7.48 | 2 | 1
(5 rows)
UPDATE 3
product_id | product_name | price | parent_id | canonical_id
------------+--------------+-------+-----------+--------------
5 | potato | 4.99 | | 5
7 | orange | 7.99 | | 7
6 | potato * | 7.50 | 5 | 5
2 | potatoe | 7.49 | 5 | 5
1 | potatoes | 7.48 | 2 | 5
(5 rows)
注意:这可能需要一些其他的启发式调整。 (甚至手动编辑)