我的系统有很多可以进行测量的设备。这些测量值存储在表“sample_data”中。 每台设备一年可能有10M的测量值。大多数时候,用户仅对相等间隔内的100分钟最大对感兴趣一段时间,例如在过去24小时或最后53周。为了获得这100分钟和最大值,将该周期分成100个相等的间隔。从每个间隔中提取最小值和最大值。您会建议使用最有效的方法来查询数据吗?到目前为止,我尝试了以下查询:
WITH periods AS (
SELECT time.start AS st, time.start + (interval '1 year' / 100) AS en
FROM generate_series(now() - interval '1 year', now(), interval '1 year' / 100) AS time(start)
)
SELECT s.* FROM sample_data s
JOIN periods ON s.time BETWEEN periods.st AND periods.en
JOIN devices d ON d.customer_id = 23
WHERE
s.id = (SELECT id FROM sample_data WHERE device_id = d.id and time BETWEEN periods.st AND periods.en ORDER BY sample ASC LIMIT 1) OR
s.id = (SELECT id FROM sample_data WHERE device_id = d.id and time BETWEEN periods.st AND periods.en ORDER BY sample DESC LIMIT 1)
此查询大约需要4秒。它不是很合适,因为sample_data表每个设备最多可包含10M行。 我发现它不是以非常优化的方式运行,但不知道为什么。我以为我已经索引了此查询中使用的所有关键字段。
您是否建议我更快地获取此类统计信息?
表“设备”:
Column | Type | Modifiers
--------------------+-----------------------------+------------------------------------------------------
id | integer | not null default nextval('devices_id_seq'::regclass)
customer_id | integer |
<Other fields skipped as they are not involved into the query>
Indexes:
"devices_pkey" PRIMARY KEY, btree (id)
"index_devices_on_iccid" UNIQUE, btree (iccid)
它有12个设备,而且在查询中指定的customer_id = 23只有4个设备。
表“sample_data”:
Column | Type | Modifiers
----------------+-----------------------------+----------------------------------------------------------
id | integer | not null default nextval('sample_data_id_seq'::regclass)
sample | numeric | not null
time | timestamp without time zone | not null
device_id | integer | not null
customer_id | integer | not null
Indexes:
"sample_data_pkey" PRIMARY KEY, btree (id)
"sample_data_device_id_time_sample_idx" btree (device_id, "time", sample)
它有大约170万行。每个4个设备的大约720K行属于customer_id = 23。 该表现在由测试数据填充。
“select version()”result:
PostgreSQL 9.3.5 on x86_64-apple-darwin13.3.0, compiled by Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn), 64-bit
track_io_timing设置为“on”
EXPLAIN(ANALYZE,BUFFERS)结果如下: http://explain.depesz.com/s/kA12
答案 0 :(得分:1)
我的猜测是性能的驱动因素是where
子句中的查询。让我们来看看其中一个:
WHERE s.id = (SELECT sd.id
FROM sample_data sd
WHERE sd.device_id = d.id and
sd.time BETWEEN periods.st AND periods.en
ORDER BY sd.sample ASC
LIMIT 1
)
您有sample_data(devide_id, time, sample)
的索引,并且您希望数据库引擎使用此索引。不幸的是,它只能为where
子句充分利用索引。由于between
,它可能不会使用order by
的索引。
是否可以使用order by
撰写time
?
WHERE s.id = (SELECT id
FROM sample_data
WHERE device_id = d.id and
time BETWEEN periods.st AND periods.en
ORDER BY time ASC
LIMIT 1
)