我有以下表格:
mysql> describe as_rilevazioni;
+----------------------------+----------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------------------+----------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| id_sistema_di_monitoraggio | longtext | NO | MUL | NULL | |
| id_unita | longtext | NO | | NULL | |
| id_sensore | longtext | NO | | NULL | |
| data | datetime | NO | | NULL | |
| timestamp | longtext | NO | | NULL | |
| unita_di_misura | longtext | NO | | NULL | |
| misura | longtext | NO | | NULL | |
+----------------------------+----------+------+-----+---------+----------------+
8 rows in set (0.00 sec)
我的桌子上有以下索引:
mysql> show indexes from as_rilevazioni;
+----------------+------------+----------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------+------------+----------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| as_rilevazioni | 0 | PRIMARY | 1 | id | A | 315865898 | NULL | NULL | | BTREE | | |
| as_rilevazioni | 0 | UNIQUE | 1 | id_sistema_di_monitoraggio | A | 17 | 5 | NULL | | BTREE | | |
| as_rilevazioni | 0 | UNIQUE | 2 | id_unita | A | 17 | 10 | NULL | | BTREE | | |
| as_rilevazioni | 0 | UNIQUE | 3 | id_sensore | A | 145225 | 30 | NULL | | BTREE | | |
| as_rilevazioni | 0 | UNIQUE | 4 | data | A | 315865898 | NULL | NULL | | BTREE | | |
+----------------+------------+----------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
5 rows in set (0.02 sec)
我担心这些索引效率不高,因为索引的基数基于列"数据"和记录数据一样大! 这些索引可以加快我的查询速度,或者在没有任何好处的情况下占用大量空间?
这是表格定义:
CREATE TABLE `as_rilevazioni` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_sistema_di_monitoraggio` longtext NOT NULL,
`id_unita` longtext NOT NULL,
`id_sensore` longtext NOT NULL,
`data` datetime NOT NULL,
`timestamp` longtext NOT NULL,
`unita_di_misura` longtext NOT NULL,
`misura` longtext NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQUE` (`id_sistema_di_monitoraggio`(5),`id_unita`(10),`id_sensore`(30),`data`)
) ENGINE=InnoDB AUTO_INCREMENT=437497044 DEFAULT CHARSET=latin1
我使用的主要查询是:
select * from as_rilevazioni where id_sistema_di_monitoraggio="<value>" and id_unita="<value>" and id_sensore="<value>" and data>="<date_1>" and data<="<date2>"
这是查询解释:
mysql> explain select * from as_rilevazioni where id_sistema_di_monitoraggio="235" and id_unita="17" and id_sensore="15" and data >= "2015-01-01 00:00:00" order by data;
+----+-------------+----------------+-------+---------------+--------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------+---------------+--------+---------+------+--------+-------------+
| 1 | SIMPLE | as_rilevazioni | range | UNIQUE | UNIQUE | 59 | NULL | 285522 | Using where |
+----+-------------+----------------+-------+---------------+--------+---------+------+--------+-------------+
1 row in set (0.00 sec)
这是数据和索引的维度:
mysql> SELECT concat(table_schema,'.',table_name) tables,
-> concat(round(table_rows/1000000,2),'M') rows,
-> concat(round(data_length/(1024*1024*1024),2),'G') data_size,
-> concat(round(index_length/(1024*1024*1024),2),'G') index_size,
-> concat(round((data_length+index_length)/(1024*1024*1024),2),'G') total_size,
-> round(index_length/data_length,2) index_data_ratio
-> FROM information_schema.TABLES
-> WHERE table_name="as_rilevazioni"
-> ORDER BY total_size DESC;
+------------------------------------+---------+-----------+------------+------------+------------------+
| tables | rows | data_size | index_size | total_size | index_data_ratio |
+------------------------------------+---------+-----------+------------+------------+------------------+
| agriculturalsupport.as_rilevazioni | 317.12M | 19.06G | 10.25G | 29.31G | 0.54 |
+------------------------------------+---------+-----------+------------+------------+------------------+
1 row in set (0.02 sec)
有什么建议吗? 谢谢大家!
答案 0 :(得分:0)
UNIQUE a(5), b(10)
太可怕了。它要检查a
的前5个字节与b
的前10个字节的唯一性。您可能希望检查完整a
和b
的组合是否具有唯一性。
INDEX a(5), b(10)
几乎无用 - 即使考虑a
也不会超过b
。
INDEX a(5)
有时无用。
UNIQUE a, data -- where `data` is `DATETIME` or `TIMESTAMP`
通常是“错误的”。你真的确定a
一秒钟内不会出现两次吗?
在查看多列索引时,“基数”通常并不重要。基数等于表中估计的行数意味着它认为该列是唯一的;但它不会指望它。
“有效”,你的意思是“不占用太多'空间”吗? UNIQUE
索引的每个“行”大约需要1 + 5 + 1 + 10 + 1 + 30 + 5 = 53个字节。多达317M,你得到17GB。添加约40%的开销以获得23GB。这比information_schema中的10GB要多得多。 (错误涉及许多近似值 - 可能主要是行数。)
或者,你的意思是“这个指数可以加速一些查询”吗?要讨论这个问题,我们需要查看查询。 (与此同时,我已经指出了指数不好的几个原因。)
如果ID是数字
如果它们确实是数字,则切换到SMALLINT UNSIGNED
(2个字节)或其他大小。 然后包含这4列(以及data
last )的索引很可能会显着加快查询速度。是的,索引将花费一些磁盘空间,但可能值得。 TEXT
,带有“前缀”,根本无法提供效率。
索引数字也比字符串便宜。您的id_unita(10)
在索引的每一行中最多占用11个字节; MEDIUMINT UNSIGNED
需要固定的3个字节。也就是说,索引将更小和更有用。