尝试比较2个查询以了解哪个查询会更快。基本的想法是有一个没有重复的表(比如test1
)。然后,您尝试仅插入第二个表中的增量(例如test2
),如果第二个表有重复项,则只插入一个记录副本。
create table test1 (id varchar(10), a bigint, b bigint);
create table test2 (id varchar(10), a bigint, b bigint);
insert into test1 values ('aaa', 1, 1), ('aa2', 1, 2), ('aa3', 1, 3);
insert into test1 values ('bbb', 2, 1), ('bb2', 2, 2);
insert into test1 values ('bbb', 2, 1), ('bb2', 2, 2);
insert into test2 values ('aaa', 1, 1), ('aa2', 1, 2), ('aa3', 1, 3);
INSERT INTO test2
SELECT DISTINCT id,
a,
b
FROM test1
WHERE NOT EXISTS (SELECT *
FROM test2
WHERE test2.id = test1.id);
INSERT INTO test2
SELECT id,
a,
b
FROM (SELECT t2.*
FROM (SELECT Row_number() OVER(partition BY id) AS dup_id,
*
FROM test1) t2
WHERE t2.dup_id = 1) t1
WHERE t1.id NOT IN (SELECT test2.id
FROM test2);
有人可以帮助我了解哪一个会更快更有效吗?
db=# explain insert into test2 select distinct id, a, b from test1 where not exists (select * from test2 where test2.id=test1.id); QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
XN Subquery Scan "*SELECT*" (cost=3613333.97..4213334.30 rows=7 width=49)
-> XN Unique (cost=3613333.97..4213334.23 rows=7 width=49)
-> XN Hash Left Join DS_BCAST_INNER (cost=3613333.97..4213334.18 rows=7 width=49)
Hash Cond: ("outer".oid = "inner".oid)
Filter: ("inner".oid IS NULL)
-> XN Seq Scan on test1 (cost=0.00..0.07 rows=7 width=53)
-> XN Hash (cost=3613333.96..3613333.96 rows=5 width=4)
-> XN Subquery Scan volt_dt_1 (cost=1760000.36..3613333.96 rows=5 width=4)
-> XN Unique (cost=1760000.36..3613333.91 rows=5 width=4)
-> XN Hash Join DS_DIST_BOTH (cost=1760000.36..3613333.90 rows=5 width=4)
Outer Dist Key: test1.id
Inner Dist Key: volt_dt_2.id
Hash Cond: (("outer".id)::text = ("inner".id)::text)
-> XN Seq Scan on test1 (cost=0.00..0.07 rows=7 width=37)
-> XN Hash (cost=1760000.34..1760000.34 rows=5 width=33)
-> XN Subquery Scan volt_dt_2 (cost=1760000.29..1760000.34 rows=5 width=33)
-> XN HashAggregate (cost=1760000.29..1760000.29 rows=5 width=33)
-> XN Hash Join DS_DIST_BOTH (cost=0.06..1760000.27 rows=5 width=33)
Outer Dist Key: test1.id
Inner Dist Key: test2.id
Hash Cond: (("outer".id)::text = ("inner".id)::text)
-> XN Seq Scan on test1 (cost=0.00..0.07 rows=7 width=33)
-> XN Hash (cost=0.05..0.05 rows=5 width=33)
-> XN Seq Scan on test2 (cost=0.00..0.05 rows=5 width=33)
----- Tables missing statistics: test2, test1 -----
----- Update statistics by running the ANALYZE command on these tables -----
(26 rows)
解释第二个查询
db=# explain insert into test2 select id, a, b from (select t2.* from ( select row_number() over(partition by id order by id) as dup_id, * from test1 ) t2 where t2.dup_id = 1 ) t1 where t1.id not in (select test2.id from test2);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
XN Hash NOT IN Join DS_DIST_INNER (cost=1000000000000.23..999999999999999967336168804116691273849533185806555472917961779471295845921727862608739868455469056.00 rows=1 width=49)
Inner Dist Key: db.test2.id
Hash Cond: (("outer".id)::text = ("inner".id)::text)
-> XN Subquery Scan t2 (cost=1000000000000.17..1000000000000.36 rows=1 width=49)
Filter: (dup_id = 1)
-> XN Window (cost=1000000000000.17..1000000000000.27 rows=7 width=49)
Partition: id
Order: id
-> XN Sort (cost=1000000000000.17..1000000000000.19 rows=7 width=49)
Sort Key: id
-> XN Network (cost=0.00..0.07 rows=7 width=49)
Distribute
-> XN Seq Scan on test1 (cost=0.00..0.07 rows=7 width=49)
-> XN Hash (cost=0.05..0.05 rows=5 width=33)
-> XN Seq Scan on test2 (cost=0.00..0.05 rows=5 width=33)
----- Tables missing statistics: test2, test1 -----
----- Update statistics by running the ANALYZE command on these tables -----
(17 rows)
答案 0 :(得分:2)
我认为首先应该更快,尽管它需要test2(id)
上的索引。
通常情况下,这些问题的答案是"尝试使用您的数据和系统。 。 。让我们知道"。但是,row_number()
需要table1
的完整扫描。您也可以同时在table2
中进行索引查找 - 这是第一个版本。
答案 1 :(得分:1)
众所周知,与SELECT,UPDATE和DELETE相比,INSERT总是很昂贵,因为INSERT没有WHERE子句,而且当表中有更多索引时也更贵。
这两个查询仍在执行INSERT。所以,我的上述观点无效。
INSERT INTO test2
SELECT DISTINCT id,
一,
b
来自test1
WHERE id NOT IN(SELECT id FROM test2);
正如大家都知道的那样,SELECT * FROM Table很昂贵,最好使用必需的属性。