理解类似MySQL和PostgreSQL数据库中的EXPLAIN语句

时间:2015-07-29 22:46:42

标签: mysql database postgresql query-optimization explain

我目前正在开发支持多个数据库的Web服务。我正在尝试优化表并修复丢失的索引。以下是MySQL查询:

public class YourTest {
  @Rule
  public SystemOutRule log = new SystemOutRule().enableLog();
  @Rule
  public TextFromStandardInputStream systemInMock = emptyStandardInputStream();

  @Test
  public void test() {
    systemInMock.provideLines("first", "second");
    ... //execute your code
    assertEquals("First!\nSecond!\n", log.getLogWithNormalizedLineSeparator());
  }
}

和解释......

SELECT 'UTC' AS timezone, pak.id AS package_id, rel.unique_id AS relay, sns.unique_id AS sensor, pak.rtime AS time,
   sns.units AS sensor_units, typ.name AS sensor_type, dat.data AS sensor_data,
   loc.altitude AS altitude, Y(loc.location) AS latitude, X(loc.location) as longitude,
   loc.speed as speed, loc.climb as climb, loc.track as track,
   loc.longitude_error as longitude_error, loc.latitude_error as latitude_error, loc.altitude_error as altitude_error,
   loc.speed_error as speed_error, loc.climb_error as climb_error, loc.track_error as track_error
 FROM sensor_data dat
 LEFT OUTER JOIN package_location loc on dat.package_id = loc.package_id
 LEFT OUTER JOIN data_package pak ON dat.package_id = pak.id
 LEFT OUTER JOIN relays rel ON pak.relay_id = rel.id
 LEFT OUTER JOIN sensors sns ON dat.sensor_id = sns.id
 LEFT OUTER JOIN sensor_types typ ON sns.sensor_type = typ.id
 WHERE typ.name='Temperature'
   AND rel.unique_id='OneWireTester'
   AND pak.rtime > '2015-01-01'
   AND pak.rtime < '2016-01-01'

......看起来非常简单。我需要在+----+-------------+-------+--------+------------------------------------------+----------------------+---------+------------------------+------+----------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+------------------------------------------+----------------------+---------+------------------------+------+----------------------------------------------------+ | 1 | SIMPLE | rel | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using where | | 1 | SIMPLE | pak | ref | PRIMARY,fk_package_relay_id | fk_package_relay_id | 9 | BigSense.rel.id | 1 | Using index condition; Using where | | 1 | SIMPLE | dat | ref | fk_sensor_package_id,fk_sensor_sensor_id | fk_sensor_package_id | 9 | BigSense.pak.id | 1 | NULL | | 1 | SIMPLE | sns | eq_ref | PRIMARY,fk_sensors_type_id | PRIMARY | 8 | BigSense.dat.sensor_id | 1 | NULL | | 1 | SIMPLE | loc | eq_ref | PRIMARY | PRIMARY | 8 | BigSense.pak.id | 1 | NULL | | 1 | SIMPLE | typ | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using where; Using join buffer (Block Nested Loop) | +----+-------------+-------+--------+------------------------------------------+----------------------+---------+------------------------+------+----------------------------------------------------+ 表和relays上添加索引以优化查询。

PostgreSQL版本的表几乎完全相同。但是当我使用以下查询时:

sensor_types

如果我进行解释分析,我会得到以下结果:

SELECT 'UTC' AS timezone, pak.id AS package_id, rel.unique_id AS relay, sns.unique_id AS sensor, pak.rtime AS time,
       sns.units AS sensor_units, typ.name AS sensor_type, dat.data AS sensor_data,
       loc.altitude AS altitude, ST_Y(loc.location::geometry) AS latitude, ST_X(loc.location::geometry) as longitude,
       loc.speed as speed, loc.climb as climb, loc.track as track,
       loc.longitude_error as longitude_error, loc.latitude_error as latitude_error, loc.altitude_error as altitude_error,
       loc.speed_error as speed_error, loc.climb_error as climb_error, loc.track_error as track_error
FROM sensor_data dat
LEFT OUTER JOIN package_location loc on dat.package_id = loc.package_id
LEFT OUTER JOIN data_package pak ON dat.package_id = pak.id
LEFT OUTER JOIN relays rel ON pak.relay_id = rel.id
LEFT OUTER JOIN sensors sns ON dat.sensor_id = sns.id
LEFT OUTER JOIN sensor_types typ ON sns.sensor_type = typ.id
WHERE typ.name='Temperature'
  AND rel.unique_id='OneWireTester'
  AND pak.rtime > '2015-01-01'
  AND pak.rtime < '2016-01-01';

表模式具有相同的外键和一般结构,因此我希望看到所需的相同索引。但是我一直在浏览pgsql的检查语句的几个指南,从我收集的内容来看,Seq Scan语句是缺少索引的指标,这意味着我在 QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=36.23..131.80 rows=1 width=477) (actual time=0.074..3.933 rows=76 loops=1) -> Nested Loop (cost=36.09..131.60 rows=1 width=349) (actual time=0.068..3.782 rows=76 loops=1) -> Nested Loop (cost=35.94..130.58 rows=4 width=267) (actual time=0.062..2.472 rows=620 loops=1) -> Hash Join (cost=35.67..128.73 rows=4 width=247) (actual time=0.053..0.611 rows=620 loops=1) Hash Cond: (dat.sensor_id = sns.id) -> Seq Scan on sensor_data dat (cost=0.00..89.46 rows=946 width=21) (actual time=0.007..0.178 rows=1006 loops=1) -> Hash (cost=35.64..35.64 rows=2 width=238) (actual time=0.037..0.037 rows=11 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Hash Join (cost=20.68..35.64 rows=2 width=238) (actual time=0.019..0.035 rows=11 loops=1) Hash Cond: (sns.sensor_type = typ.id) -> Seq Scan on sensors sns (cost=0.00..13.60 rows=360 width=188) (actual time=0.002..0.005 rows=31 loops=1) -> Hash (cost=20.62..20.62 rows=4 width=66) (actual time=0.010..0.010 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Seq Scan on sensor_types typ (cost=0.00..20.62 rows=4 width=66) (actual time=0.006..0.008 rows=1 loops=1) Filter: ((name)::text = 'Temperature'::text) Rows Removed by Filter: 4 -> Index Scan using data_package_pkey on data_package pak (cost=0.28..0.45 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=620) Index Cond: (id = dat.package_id) Filter: ((rtime > '2015-01-01 00:00:00'::timestamp without time zone) AND (rtime < '2016-01-01 00:00:00'::timestamp without time zone)) -> Index Scan using relays_pkey on relays rel (cost=0.14..0.24 rows=1 width=94) (actual time=0.002..0.002 rows=0 loops=620) Index Cond: (id = pak.relay_id) Filter: ((unique_id)::text = 'OneWireTester'::text) Rows Removed by Filter: 1 -> Index Scan using package_location_pkey on package_location loc (cost=0.14..0.18 rows=1 width=140) (actual time=0.001..0.001 rows=0 loops=76) Index Cond: (dat.package_id = package_id) Planning time: 0.959 ms Execution time: 4.030 ms (27 rows) sensors上缺少索引, sensor_data

我是否正确地解释了这些检查陈述的结果?为优化两个数据库,我应该寻找什么?

1 个答案:

答案 0 :(得分:0)

在PostgreSQL(也可能是MySQL)中,索引的使用不仅仅是因为它们已经定义,而是在加速查询时使用它们。

EXPLAIN ANALYZE输出中,您会在括号中看到cost上的一个部分,后跟actual time上的类似部分。查询计划程序查看cost,它由配置文件中列出的许多参数定义。这些成本是IO和CPU时间之类的东西,前者通常具有比后者高得多的值(通常是100倍的差异)。这意味着查询规划器会尝试最小化需要从磁盘读取的数据量,该数据量取决于预定大小(通常为4kB)的页面,而不是单个行(这是因为这样可以更快地访问由于硬盘驱动器的物理特性)。表本身和索引都存储在磁盘上。如果表很小,它将适合几页,甚至可能只是一页。由于CPU时间与IO时间相比便宜,因此顺序扫描几页比使用索引读取磁盘页面的额外IO要快得多。

正如您从EXPLAIN ANALYZE输出中可以看出的那样,您的大多数表都很小并且适合少数页面。如果你真的想测试索引的功能,你应该用一百万左右的随机数据行加载你的表,然后进行测试。