我目前正在开发支持多个数据库的Web服务。我正在尝试优化表并修复丢失的索引。以下是MySQL查询:
public class YourTest {
@Rule
public SystemOutRule log = new SystemOutRule().enableLog();
@Rule
public TextFromStandardInputStream systemInMock = emptyStandardInputStream();
@Test
public void test() {
systemInMock.provideLines("first", "second");
... //execute your code
assertEquals("First!\nSecond!\n", log.getLogWithNormalizedLineSeparator());
}
}
和解释......
SELECT 'UTC' AS timezone, pak.id AS package_id, rel.unique_id AS relay, sns.unique_id AS sensor, pak.rtime AS time,
sns.units AS sensor_units, typ.name AS sensor_type, dat.data AS sensor_data,
loc.altitude AS altitude, Y(loc.location) AS latitude, X(loc.location) as longitude,
loc.speed as speed, loc.climb as climb, loc.track as track,
loc.longitude_error as longitude_error, loc.latitude_error as latitude_error, loc.altitude_error as altitude_error,
loc.speed_error as speed_error, loc.climb_error as climb_error, loc.track_error as track_error
FROM sensor_data dat
LEFT OUTER JOIN package_location loc on dat.package_id = loc.package_id
LEFT OUTER JOIN data_package pak ON dat.package_id = pak.id
LEFT OUTER JOIN relays rel ON pak.relay_id = rel.id
LEFT OUTER JOIN sensors sns ON dat.sensor_id = sns.id
LEFT OUTER JOIN sensor_types typ ON sns.sensor_type = typ.id
WHERE typ.name='Temperature'
AND rel.unique_id='OneWireTester'
AND pak.rtime > '2015-01-01'
AND pak.rtime < '2016-01-01'
......看起来非常简单。我需要在+----+-------------+-------+--------+------------------------------------------+----------------------+---------+------------------------+------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+------------------------------------------+----------------------+---------+------------------------+------+----------------------------------------------------+
| 1 | SIMPLE | rel | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using where |
| 1 | SIMPLE | pak | ref | PRIMARY,fk_package_relay_id | fk_package_relay_id | 9 | BigSense.rel.id | 1 | Using index condition; Using where |
| 1 | SIMPLE | dat | ref | fk_sensor_package_id,fk_sensor_sensor_id | fk_sensor_package_id | 9 | BigSense.pak.id | 1 | NULL |
| 1 | SIMPLE | sns | eq_ref | PRIMARY,fk_sensors_type_id | PRIMARY | 8 | BigSense.dat.sensor_id | 1 | NULL |
| 1 | SIMPLE | loc | eq_ref | PRIMARY | PRIMARY | 8 | BigSense.pak.id | 1 | NULL |
| 1 | SIMPLE | typ | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+--------+------------------------------------------+----------------------+---------+------------------------+------+----------------------------------------------------+
表和relays
上添加索引以优化查询。
PostgreSQL版本的表几乎完全相同。但是当我使用以下查询时:
sensor_types
如果我进行解释分析,我会得到以下结果:
SELECT 'UTC' AS timezone, pak.id AS package_id, rel.unique_id AS relay, sns.unique_id AS sensor, pak.rtime AS time,
sns.units AS sensor_units, typ.name AS sensor_type, dat.data AS sensor_data,
loc.altitude AS altitude, ST_Y(loc.location::geometry) AS latitude, ST_X(loc.location::geometry) as longitude,
loc.speed as speed, loc.climb as climb, loc.track as track,
loc.longitude_error as longitude_error, loc.latitude_error as latitude_error, loc.altitude_error as altitude_error,
loc.speed_error as speed_error, loc.climb_error as climb_error, loc.track_error as track_error
FROM sensor_data dat
LEFT OUTER JOIN package_location loc on dat.package_id = loc.package_id
LEFT OUTER JOIN data_package pak ON dat.package_id = pak.id
LEFT OUTER JOIN relays rel ON pak.relay_id = rel.id
LEFT OUTER JOIN sensors sns ON dat.sensor_id = sns.id
LEFT OUTER JOIN sensor_types typ ON sns.sensor_type = typ.id
WHERE typ.name='Temperature'
AND rel.unique_id='OneWireTester'
AND pak.rtime > '2015-01-01'
AND pak.rtime < '2016-01-01';
表模式具有相同的外键和一般结构,因此我希望看到所需的相同索引。但是我一直在浏览pgsql的检查语句的几个指南,从我收集的内容来看,Seq Scan语句是缺少索引的指标,这意味着我在 QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=36.23..131.80 rows=1 width=477) (actual time=0.074..3.933 rows=76 loops=1)
-> Nested Loop (cost=36.09..131.60 rows=1 width=349) (actual time=0.068..3.782 rows=76 loops=1)
-> Nested Loop (cost=35.94..130.58 rows=4 width=267) (actual time=0.062..2.472 rows=620 loops=1)
-> Hash Join (cost=35.67..128.73 rows=4 width=247) (actual time=0.053..0.611 rows=620 loops=1)
Hash Cond: (dat.sensor_id = sns.id)
-> Seq Scan on sensor_data dat (cost=0.00..89.46 rows=946 width=21) (actual time=0.007..0.178 rows=1006 loops=1)
-> Hash (cost=35.64..35.64 rows=2 width=238) (actual time=0.037..0.037 rows=11 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Hash Join (cost=20.68..35.64 rows=2 width=238) (actual time=0.019..0.035 rows=11 loops=1)
Hash Cond: (sns.sensor_type = typ.id)
-> Seq Scan on sensors sns (cost=0.00..13.60 rows=360 width=188) (actual time=0.002..0.005 rows=31 loops=1)
-> Hash (cost=20.62..20.62 rows=4 width=66) (actual time=0.010..0.010 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 1kB
-> Seq Scan on sensor_types typ (cost=0.00..20.62 rows=4 width=66) (actual time=0.006..0.008 rows=1 loops=1)
Filter: ((name)::text = 'Temperature'::text)
Rows Removed by Filter: 4
-> Index Scan using data_package_pkey on data_package pak (cost=0.28..0.45 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=620)
Index Cond: (id = dat.package_id)
Filter: ((rtime > '2015-01-01 00:00:00'::timestamp without time zone) AND (rtime < '2016-01-01 00:00:00'::timestamp without time zone))
-> Index Scan using relays_pkey on relays rel (cost=0.14..0.24 rows=1 width=94) (actual time=0.002..0.002 rows=0 loops=620)
Index Cond: (id = pak.relay_id)
Filter: ((unique_id)::text = 'OneWireTester'::text)
Rows Removed by Filter: 1
-> Index Scan using package_location_pkey on package_location loc (cost=0.14..0.18 rows=1 width=140) (actual time=0.001..0.001 rows=0 loops=76)
Index Cond: (dat.package_id = package_id)
Planning time: 0.959 ms
Execution time: 4.030 ms
(27 rows)
,sensors
上缺少索引, sensor_data
。
我是否正确地解释了这些检查陈述的结果?为优化两个数据库,我应该寻找什么?
答案 0 :(得分:0)
在PostgreSQL(也可能是MySQL)中,索引的使用不仅仅是因为它们已经定义,而是在加速查询时使用它们。
在EXPLAIN ANALYZE
输出中,您会在括号中看到cost
上的一个部分,后跟actual time
上的类似部分。查询计划程序查看cost
,它由配置文件中列出的许多参数定义。这些成本是IO和CPU时间之类的东西,前者通常具有比后者高得多的值(通常是100倍的差异)。这意味着查询规划器会尝试最小化需要从磁盘读取的数据量,该数据量取决于预定大小(通常为4kB)的页面,而不是单个行(这是因为这样可以更快地访问由于硬盘驱动器的物理特性)。表本身和索引都存储在磁盘上。如果表很小,它将适合几页,甚至可能只是一页。由于CPU时间与IO时间相比便宜,因此顺序扫描几页比使用索引读取磁盘页面的额外IO要快得多。
正如您从EXPLAIN ANALYZE
输出中可以看出的那样,您的大多数表都很小并且适合少数页面。如果你真的想测试索引的功能,你应该用一百万左右的随机数据行加载你的表,然后进行测试。