如何非常快地生成〜十亿个VIN数字(<1小时)?
现在我使用这个查询:
INSERT INTO vins (number)
SELECT p.number FROM (
SELECT generate_series(1,10000000) AS num,
upper(substring(md5(random()::text) from 0 for 16)) AS number
) p
ON CONFLICT DO NOTHING;
这种方法很慢。此外,它只生成一小部分数据,需要手动重新运行。
有什么想法吗?
答案 0 :(得分:1)
除了删除@Laurenz建议的索引和约束外,Albe在插入时设置表unlogged
。
将vin编号存储为bytea会将其大小减少一半以上,从而使插入和搜索更快
create table vins (number bytea primary key);
alter table vins set unlogged;
explain analyze
insert into vins (number)
select decode(left(md5(random()::text),16), 'hex')
from generate_series(1,10000000) s
on conflict do nothing;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Insert on vins (cost=0.00..37.50 rows=1000 width=32) (actual time=246576.200..246576.200 rows=0 loops=1)
Conflict Resolution: NOTHING
Tuples Inserted: 9976941
Conflicting Tuples: 23059
-> Function Scan on generate_series s (cost=0.00..27.50 rows=1000 width=32) (actual time=1633.756..56257.494 rows=10000000 loops=1)
Planning time: 0.097 ms
Execution time: 246661.084 ms
重新开启记录:
alter table vins set logged;
要搜索:
select
encode(number, 'hex') hex_representation,
pg_column_size(number) bytea_storage_size,
pg_column_size(encode(number, 'hex')) text_storage_size
from vins
where number = decode('987ce0e63614afbb', 'hex')
;
hex_representation | bytea_storage_size | text_storage_size
--------------------+--------------------+-------------------
987ce0e63614afbb | 9 | 20