我有一张桌子:
create table purchase(
transaction_id integer,
account_id bigint,
created timestamp with time zone,
price numeric(5,2)
)
我认为系统向我发送重复记录存在问题,但我不知道这个问题有多广泛。
我需要一个查询来选择在1秒内创建的所有记录(不一定是同一秒),这些记录具有相同的account_id和相同的价格。因此,例如,我希望能够找到这两个记录:
+----------------+----------------+-------------------------------+-------+
| transaction_id | account_id | created | price |
+----------------+----------------+-------------------------------+-------+
| 85239 | 80012340116730 | 2014-05-07 15:46:03.361959+00 | 8.47 |
| 85240 | 80012340116730 | 2014-05-07 15:46:04.118911+00 | 8.47 |
+----------------+----------------+-------------------------------+-------+
如何在单个查询中执行此操作?
我正在使用PostgreSQL 9.3。
答案 0 :(得分:5)
您需要在两个方向内检查一秒内是否存在行 并且您需要从测试中排除行本身:
SELECT *
FROM purchase p
WHERE EXISTS (
SELECT 1
FROM purchase p1
WHERE p1.created > p.created - interval '1 sec' -- "less than a second"
AND p1.created < p.created + interval '1 sec'
AND p1.account_id = p.account_id
AND p1.price = p.price
AND p1.transaction_id <> p.transaction_id -- assuming that's the pk
)
ORDER BY account_id, price, created; -- optional, for handy output
这些WHERE
条件为sargable,允许在created
上使用索引:
WHERE p1.created > p.created - interval '1 sec'
AND p1.created < p.created + interval '1 sec'
相反:
p1.created - p.created < interval '1 sec'
后者不能使用created
上的索引,这可能会减慢使用大表的查询速度。
Postgres被迫测试所有剩余的组合(在应用其他条件之后)。根据其他条件的选择性和工作台的大小,这可能是无关紧要的,也可能是中等到巨大的性能消耗
对于中小型表,测试显示了两个序列扫描和一个哈希半连接用于任一查询。
案例的完美索引是以下形式的多列索引:
CREATE INDEX purchase_foo_idx ON purchase (account_id, price, created)
但是各个列的索引组合也能很好地工作(并且可能有更多的用例)。
答案 1 :(得分:2)
我认为你正在寻找这样的东西:
select *
from purchase p1
where exists (
select transaction_id
from purchase p2
where p2.created > p1.created
and p2.created - p1.created < interval '1 second'
and p2.account_id = p1.account_id
and p2.price = p1.price)
编辑: 大表上的查询可能非常繁重。考虑限制它,例如有一天:
select *
from purchase p1
where
p1.created::date = '2014-05-08'
and exists (
select transaction_id
from purchase p2
where p2.created::date = '2014-05-08'
and p2.created > p1.created
and p2.created - p1.created < interval '1 second'
and p2.account_id = p1.account_id
and p2.price = p1.price)