Question

我有一个要求，我需要将记录以10,000记录/秒的速度存储到数据库中（在几个字段上编制索引）。一条记录中的列数为25.我在一个事务块中进行100,000条记录的批量插入。为了提高插入率，我将表空间从磁盘更改为RAM。因此我每秒只能实现5,000次插入。

我还在postgres配置中进行了以下调整：

索引：否
fsync：false
记录：已禁用

其他信息：

表空间：RAM
一行中的列数：25（主要是整数）
CPU：4核，2.5 GHz
RAM：48 GB

我想知道为什么当数据库没有在磁盘上写任何东西时，单个插入查询平均花费大约0.2毫秒（因为我使用的是基于RAM的表空间）。有什么我做错了吗？

帮助表示赞赏。

PRASHANT

Answer 1

快速数据加载

将您的数据翻译为CSV。
创建一个临时表（如您所述，没有索引）。
执行COPY命令：\COPY schema.temp_table FROM /tmp/data.csv WITH CSV
将数据插入非临时表。
创建索引。
设置适当的统计信息。

进一步建议

对于大量数据：

将数据拆分为子表。
按照大多数SELECT语句将使用的列的顺序插入。换句话说，尝试将物理模型与逻辑模型对齐。
调整配置设置。
创建CLUSTER索引（左侧最重要的列）。例如：

    CREATE UNIQUE INDEX measurement_001_stc_index
      ON climate.measurement_001
      USING btree
      (station_id, taken, category_id);
    ALTER TABLE climate.measurement_001 CLUSTER ON measurement_001_stc_index;

配置设置

在具有4GB RAM的机器上，我做了以下操作......

内核配置

告诉内核程序可以使用大量共享内存：

sysctl -w kernel.shmmax=536870912
sysctl -p /etc/sysctl.conf

PostgreSQL配置

修改/etc/postgresql/8.4/main/postgresql.conf并设置：

shared_buffers = 1GB
temp_buffers = 32MB
work_mem = 32MB
maintenance_work_mem = 64MB
seq_page_cost = 1.0
random_page_cost = 2.0
cpu_index_tuple_cost = 0.001
effective_cache_size = 512MB
checkpoint_segments = 10

根据需要调整值并适合您的环境。您可能必须稍后更改它们以进行适当的读/写优化。
重启PostgreSQL。

儿童表

例如，假设您有基于天气的数据，分为不同的类别。而不是只有一个怪异的表，将它分成几个表（每个类别一个）。

主表

CREATE TABLE climate.measurement
(
  id bigserial NOT NULL,
  taken date NOT NULL,
  station_id integer NOT NULL,
  amount numeric(8,2) NOT NULL,
  flag character varying(1) NOT NULL,
  category_id smallint NOT NULL,
  CONSTRAINT measurement_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);

子表

CREATE TABLE climate.measurement_001
(
-- Inherited from table climate.measurement_001:  id bigint NOT NULL DEFAULT nextval('climate.measurement_id_seq'::regclass),
-- Inherited from table climate.measurement_001:  taken date NOT NULL,
-- Inherited from table climate.measurement_001:  station_id integer NOT NULL,
-- Inherited from table climate.measurement_001:  amount numeric(8,2) NOT NULL,
-- Inherited from table climate.measurement_001:  flag character varying(1) NOT NULL,
-- Inherited from table climate.measurement_001:  category_id smallint NOT NULL,
  CONSTRAINT measurement_001_pkey PRIMARY KEY (id),
  CONSTRAINT measurement_001_category_id_ck CHECK (category_id = 1)
)
INHERITS (climate.measurement)
WITH (
  OIDS=FALSE
);

表统计

重置重要列的表统计信息：

ALTER TABLE climate.measurement_001 ALTER COLUMN taken SET STATISTICS 1000;
ALTER TABLE climate.measurement_001 ALTER COLUMN station_id SET STATISTICS 1000;

之后不要忘记VACUUM和ANALYSE。

Answer 2

你正在做插入一系列的

吗？

INSERT INTO tablename (...) VALUES (...);
INSERT INTO tablename (...) VALUES (...);
...

或作为一个多行插入：

INSERT INTO tablename (...) VALUES (...),(...),(...);

第二个将在100k行上显着更快。

来源：http://kaiv.wordpress.com/2007/07/19/faster-insert-for-multiple-rows/

Answer 3

您是否也将xlog（WAL段）放在RAM驱动器上？如果没有，你仍然写入磁盘。那么wal_buffers，checkpoint_segments等的设置呢？您必须尝试在wal_buffers中获取所有100,000条记录（单个事务）。增加此参数可能会导致PostgreSQL请求比操作系统的默认配置允许的更多System V共享内存。

Answer 4

我建议您使用COPY代替INSERT。

您还应该对postgresql.conf文件进行微调。

了解http://wiki.postgresql.org/wiki/Performance_Optimization

Postgresql内存表空间中的插入速度慢

4 个答案:

快速数据加载

进一步建议

配置设置

内核配置

PostgreSQL配置

儿童表

主表

子表

表统计