我正在重新设计数据库方案,以提高查询性能。在新设计中,每年每月有5个表(在下面的示例中使用3个表)划分为分区(对于测试用例,在172个分区中共有860个表)。使用适当的索引类型和运算符类为相关字段建立索引。数据库中装有模拟数据,这些模拟数据是可以在生产环境中出现的合理数据。数据几乎永远不会更新,一旦存储就只能读取。
有
硬件和软件配置:
Windows 10 Professional 64bit
Intel Core i7-4790CPU
1 TB SATA HDD
16 GB RAM
PostgreSQL 11beta 1
Postgres配置(postgresql.conf):
shared_buffers = 512MB
temp_buffers = 32MB
work_mem = 32MB
maintenance_work_mem = 1GB
max_worker_processes = 8
max_parallel_workers = 8
max_parallel_workers_per_gather = 2
enable_partition_pruning = on
enable_parallel_append = on
constraint_exclusion = partition
default_statistics_target = 500
effective_cache_size = 12GB
数据库架构:
table measurements (10M records total):
id serial
guid TEXT NOT NULL (index: btree, text_pattern_ops)
start TIMESTAMP(0) WITHOUT TIME ZONE NOT NULL (index: btree)
stop TIMESTAMP(0) WITHOUT TIME ZONE NOT NULL
mount_point_id SMALLINT NOT NULL (index: btree)
name TEXT NOT NULL
comment TEXT NOT NULL
PARTITION BY RANGE (start)
table process_data (40M records total):
id serial
mount_point_id SMALLINT NOT NULL (index:btree)
measurement_id INTEGER NOT NULL (index: btree)
measurement_start TIMESTAMP WITHOUT TIME ZONE NOT NULL (index: btree)
item_id SMALLINT NOT NULL (index: btree( item_id, item_value) )
item_value REAL NOT NULL
PARTITION BY RANGE (measurement_start)
table material_data (160M records total):
id serial,
mount_point_id SMALLINT NOT NULL (index: btree)
measurement_id INTEGER NOT NULL (index: btree)
measurement_start TIMESTAMP WITHOUT TIME ZONE NOT NULL (index: btree)
material_index SMALLINT NOT NULL (index: btree)
material_data TEXT NOT NULL (index: btree, text_pattern_ops)
PARTITION BY RANGE (measurement_start)
Table relations:
measurements 1 ---+--- 1..N process_data
+--- 1..N material_data
+--- 1..N ...
这些是基表,为清楚起见,我提供了索引信息。实际上,索引适用于各个分区表。
partition tables (data given for one partition):
partition_2018_06_measurements: 60K records
partition_2018_06_process_data: 240K record
partition_2018_06_material_data: 950K records
常见查询是:
我用不同数量的测量记录和统计目标进行了一些测试(表中的测量和统计目标分别为100,250,500,750和1000,从10K到10M记录。总共有20种不同的方案,并且每种方案的结果都具有可比性)情况下,较高的统计目标会带来更好的结果。
用于测试的SQL查询:
DROP VIEW IF EXISTS view_measurements;
DROP VIEW IF EXISTS view_material;
DROP VIEW IF EXISTS view_process;
CREATE TEMPORARY VIEW view_measurements AS
(
SELECT * FROM
measurements m
WHERE
m.start BETWEEN '2018-06-01 00:00:00' AND '2018-07-01 00:00:00'
AND m.mount_point_id IN( 1,3,5,7,9,11,13,15,17,19 )
);
CREATE TEMPORARY VIEW view_material AS
(
SELECT
md.measurement_id,
md.material_index,
md.material_data
FROM
material_data md
WHERE
-- exclude as many rows as possible
md.measurement_start BETWEEN '2018-06-01 00:00:00' AND '2018-07-01 00:00:00'
AND md.mount_point_id IN( 1,3,5,7,9,11,13,15,17,19 )
AND (md.material_data LIKE 'SHX%' OR md.material_data LIKE 'CU23%')
);
CREATE TEMPORARY VIEW view_process AS
(
SELECT
pd.measurement_id,
pd.item_id,
pd.item_value
FROM
process_data pd
WHERE
-- exclude as many rows as possible
pd.measurement_start BETWEEN '2018-06-01 00:00:00' AND '2018-07-01 00:00:00'
AND pd.mount_point_id IN( 1,3,5,7,9,11,13,15,17,19 )
AND pd.item_id IN ( 110, 111 )
);
--EXPLAIN ANALYZE VERBOSE
SELECT
*
FROM
view_measurements vm
WHERE
(
(
EXISTS( SELECT 1 FROM view_material md WHERE vm.id = md.measurement_id AND md.material_data LIKE 'SHX%' ) OR
EXISTS( SELECT 1 FROM view_material md WHERE vm.id = md.measurement_id AND md.material_data LIKE 'CU23%' )
)
AND
(
EXISTS( SELECT 1 FROM view_process pd WHERE vm.id = pd.measurement_id AND pd.item_id = 110 AND pd.item_value > 1700 ) AND
EXISTS( SELECT 1 FROM view_process pd WHERE vm.id = pd.measurement_id AND pd.item_id = 111 AND pd.item_value > 2.2 )
)
);
上面的查询选择了从01.06.2018到01.07.2018的所有度量值
- a material item starting with 'SHX' or there is an material_item starting with 'CU23' AND
- a process data item with id 110 and value > 1700 AND
- a process data item with id 110 and value > 2.2
用于测量行。该查询返回了18个项目。
上面的查询有时需要花费1分钟的时间从未准备好的数据库中进行。这似乎太慢了,尤其是当所有数据都恰好来自3个表时(该间隔恰好适合分区2018_06)。将数据加载到数据库缓存后,具有类似参数的查询将在几百毫秒内返回。我对较大的分区(季度与月份)运行了相同的查询,而初始查询花费的时间甚至更长(2分钟而不是1分钟)。 query plan optimizer 显示查询计划者对行的估计比实际结果(项目10和11)小200x / 400x。
我尝试使用CTE代替视图,但是时间更糟。
预先感谢您, 圭多