在PostgreSQL中生成~10亿VIN的最佳(最快)方法

时间:2017-01-23 11:13:31

标签: sql postgresql

如何非常快地生成〜十亿个VIN数字(<1小时)?

现在我使用这个查询:

INSERT INTO vins (number)
SELECT p.number FROM (
    SELECT generate_series(1,10000000) AS num, 
           upper(substring(md5(random()::text) from 0 for 16)) AS number
    ) p
ON CONFLICT DO NOTHING;

这种方法很慢。此外,它只生成一小部分数据,需要手动重新运行。

有什么想法吗?

1 个答案:

答案 0 :(得分:1)

除了删除@Laurenz建议的索引和约束外,Albe在插入时设置表unlogged

将vin编号存储为bytea会将其大小减少一半以上,从而使插入和搜索更快

create table vins (number bytea primary key);

alter table vins set unlogged;

explain analyze
insert into vins (number)
select decode(left(md5(random()::text),16), 'hex')
from generate_series(1,10000000) s 
on conflict do nothing;
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Insert on vins  (cost=0.00..37.50 rows=1000 width=32) (actual time=246576.200..246576.200 rows=0 loops=1)
   Conflict Resolution: NOTHING
   Tuples Inserted: 9976941
   Conflicting Tuples: 23059
   ->  Function Scan on generate_series s  (cost=0.00..27.50 rows=1000 width=32) (actual time=1633.756..56257.494 rows=10000000 loops=1)
 Planning time: 0.097 ms
 Execution time: 246661.084 ms

重新开启记录:

alter table vins set logged;

要搜索:

select
    encode(number, 'hex') hex_representation,
    pg_column_size(number) bytea_storage_size,
    pg_column_size(encode(number, 'hex')) text_storage_size
from vins
where number = decode('987ce0e63614afbb', 'hex')
;
 hex_representation | bytea_storage_size | text_storage_size 
--------------------+--------------------+-------------------
 987ce0e63614afbb   |                  9 |                20