Question

我正在编写一个使用MySQL将文件哈希数据保存到带有单个表的简单数据库的应用程序。我创建如下：

CREATE DATABASE IF NOT EXISTS hash_db;

CREATE TABLE IF NOT EXISTS hash_db.main_tbl  
(                                       
  sha256       CHAR(64) PRIMARY KEY    ,
  sha1         CHAR(40) UNIQUE KEY     ,
  md5          CHAR(32) UNIQUE KEY     ,
  created      DATETIME                ,
  modified     DATETIME                ,
  size         BIGINT                  ,
  ext          VARCHAR(260)            ,
  path         TEXT(32768)             ,
  new_record   BOOL                     
 )                                      
ENGINE = MyISAM

CREATE UNIQUE INDEX sha256_idx ON hash_db.main_tbl (sha256)
CREATE UNIQUE INDEX sha1_idx   ON hash_db.main_tbl (sha1)
CREATE UNIQUE INDEX md5_idx    ON hash_db.main_tbl (md5)

然后我正在执行简单选择和插入表单：

SELECT * FROM hash_db.main_tbl WHERE
      sha256 = '...'   OR
      sha1   = '...'   OR
      md5    = '...'


INSERT INTO hash_db.main_tbl
  (sha256, sha1, md5, created, modified, size, ext, path, new_record) VALUES
  (
    '...'                    ,
    '...'                    ,
    '...'                    ,
    FROM_UNIXTIME(...)       ,
    FROM_UNIXTIME(...)       ,
    ...                      ,
    '...'                    ,
    '...'                    ,
    TRUE                                  
  )

数据几乎是随机的，唯一性概率非常高（不是它应该重要，还是它应该？）。第一个问题，对于这种用法，InnoDB比MyISAM慢得多（比慢7倍）是正常的吗？我读到它应该是反过来的（尝试使用512M innodb_buffer_pool_size，没有区别）。

第二......我已经测试了有没有索引（MyISAM），带索引的版本实际上更慢。这些是我的应用程序测量的实际性能数据（使用C中的性能计数器）：

With indexes:
Selects per second: 393.7
Inserts per second: 1056.1

Without indexes:
Selects per second: 585.3
Inserts per second: 1480.9

我得到的数据是可重复的。我已经测试了扩大的key_buffer_size（32M，默认为8M）。

我做错了什么或错过了什么？

=============================================== =================================

根据戈登·林诺夫的建议编辑：

我尝试过使用UNION ALL，但实际上我的性能有所下降，精确到每秒70次。 EXPLAIN的输出如下：

EXPLAIN EXTENDED SELECT * FROM main_hash_db.main_tbl WHERE md5 = '...'

+----+-------------+----------+-------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table    | type  | possible_keys | key  | key_len | ref   | rows | filtered | Extra |
+----+-------------+----------+-------+---------------+------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | main_tbl | const | md5           | md5  | 97      | const |    1 |   100.00 | NULL  |
+----+-------------+----------+-------+---------------+------+---------+-------+------+----------+-------+


EXPLAIN EXTENDED SELECT * FROM main_hash_db.main_tbl WHERE md5 = '...' UNION ALL SELECT * FROM main_hash_db.main_tbl WHERE sha1 = '...'

+----+--------------+------------+-------+-----------------------+------+---------+-------+------+----------+-----------------+
| id | select_type  | table      | type  | possible_keys         | key  | key_len | ref   | rows | filtered | Extra           |
+----+--------------+------------+-------+-----------------------+------+---------+-------+------+----------+-----------------+
|  1 | PRIMARY      | main_tbl   | const | md5                   | md5  | 97      | const |    1 |   100.00 | NULL            |
|  2 | UNION        | main_tbl   | const | sha1,sha1_idx,md5_idx | sha1 | 121     | const |    1 |   100.00 | NULL            |
| NULL | UNION RESULT | <union1,2> | ALL   | NULL                  | NULL | NULL    | NULL  | NULL |     NULL | Using temporary |
+----+--------------+------------+-------+-----------------------+------+---------+-------+------+----------+-----------------+


EXPLAIN EXTENDED SELECT * FROM main_hash_db.main_tbl WHERE md5 = '...' UNION ALL SELECT * FROM main_hash_db.main_tbl WHERE sha1 = '...' UNION ALL SELECT * FROM main_hash_db.main_tbl WHERE sha256 = '...'

+----+--------------+--------------+-------+-----------------------+---------+---------+-------+------+----------+-----------------+
| id | select_type  | table        | type  | possible_keys         | key     | key_len | ref   | rows | filtered | Extra           |
+----+--------------+--------------+-------+-----------------------+---------+---------+-------+------+----------+-----------------+
|  1 | PRIMARY      | main_tbl     | const | md5                   | md5     | 97      | const |    1 |   100.00 | NULL            |
|  2 | UNION        | main_tbl     | const | sha1,sha1_idx,md5_idx | sha1    | 121     | const |    1 |   100.00 | NULL            |
|  3 | UNION        | main_tbl     | const | PRIMARY,sha256_idx    | PRIMARY | 192     | const |    1 |   100.00 | NULL            |
| NULL | UNION RESULT | <union1,2,3> | ALL   | NULL                  | NULL    | NULL    | NULL  | NULL |     NULL | Using temporary |
+----+--------------+--------------+-------+-----------------------+---------+---------+-------+------+----------+-----------------+

这让我看到我在索引创建中出错（我为'sha1'列创建了两个单独的索引）。但是在修复之后事情仍然很慢（每秒约70次），这里是EXPLAIN的输出：

+----+--------------+--------------+-------+--------------------+---------+---------+-------+------+----------+-----------------+
| id | select_type  | table        | type  | possible_keys      | key     | key_len | ref   | rows | filtered | Extra           |
+----+--------------+--------------+-------+--------------------+---------+---------+-------+------+----------+-----------------+
|  1 | PRIMARY      | main_tbl     | const | md5,md5_idx        | md5     | 97      | const |    1 |   100.00 | NULL            |
|  2 | UNION        | main_tbl     | const | sha1,sha1_idx      | sha1    | 121     | const |    1 |   100.00 | NULL            |
|  3 | UNION        | main_tbl     | const | PRIMARY,sha256_idx | PRIMARY | 192     | const |    1 |   100.00 | NULL            |
| NULL | UNION RESULT | <union1,2,3> | ALL   | NULL               | NULL    | NULL    | NULL  | NULL |     NULL | Using temporary |
+----+--------------+--------------+-------+--------------------+---------+---------+-------+------+----------+-----------------+

=============================================== =================================

进一步讨论后的第三次编辑（见下文）。这是原始查询的EXPLAIN输出（未定义其他索引，如上所述创建数据库）：

explain extended select path from main_hash_db.main_tbl where sha256 = '...' or md5 = '...' or sha1 = '...' ;

+----+-------------+----------+-------------+------------------+------------------+------------+------+------+----------+--------------------------------------------+
| id | select_type | table    | type        | possible_keys    | key              | key_len    | ref  | rows | filtered | Extra                                      |
+----+-------------+----------+-------------+------------------+------------------+------------+------+------+----------+--------------------------------------------+
|  1 | SIMPLE      | main_tbl | index_merge | PRIMARY,sha1,md5 | PRIMARY,md5,sha1 | 192,97,121 | NULL |    3 |   100.00 | Using union(PRIMARY,md5,sha1); Using where |
+----+-------------+----------+-------------+------------------+------------------+------------+------+------+----------+--------------------------------------------+

我的应用程序衡量的效果：

Selects per second: 500.6
Inserts per second: 1394.8

这是3个选择的结果（单独发布，而不是UNION）：

Selects per second: 2525.1
Inserts per second: 1584.3

Answer 1

首先，你会期望没有索引的insert更快。那里没有神秘感。索引不必维护。事实上，在进行大型插入时，一个好的策略通常是首先删除索引，执行插入，然后重建它们。

select更麻烦。毕竟，这就是您希望使用索引的位置。您的查询是：

SELECT *
FROM hash_db.main_tbl
WHERE sha256 = '...'   OR
      sha1   = '...'   OR
      md5    = '...';

这恰好是索引使用的最坏情况。您需要查看explain以查看索引的使用方式。

我的建议是写这样的查询：

SELECT *
FROM hash_db.main_tbl
WHERE sha256 = '...'
UNION ALL
SELECT *
FROM hash_db.main_tbl
sha1   = '...'
UNION ALL
SELECT *
FROM hash_db.main_tbl
WHERE md5    = '...';

（如果你真的想要消除重复，请使用union。）

这应该利用每个子查询的每个索引，并且应该为您提供所需的性能。

Answer 2

您将降低性能并减慢数据库的流量，因为您将创建大量索引

每次插入一个元组时，意味着你将直接增加系统SGBD中的索引数

当你做一个选择时，它就像一个研究......有大量的索引在系统中提出了一些挑战;像优先级一样，你将拥有大量的优先级，你的数据库中有3000个索引的1000个元组

每个软件都有自己的方法来管理索引;并且您必须知道如何掌握索引，然后您可以将系统推向最大潜力你可以使用例如带有索引的trrigrs来实现良好的平衡

MySQL使用索引降低性能？

2 个答案:

MySQL使用*索引降低性能*？

2 个答案:

MySQL使用索引降低性能？