MySQL性能 - 字符串与整数

时间:2016-10-19 13:47:23

标签: mysql sql database performance mysql-workbench

我已经意识到MySQL的一个非常奇怪的行为,我没有解释。

这不是一个过于复杂的查询:

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = 2 
    OR enr.Enricher4State = 2 
    OR enr.Enricher5State = 2 
    OR enr.Enricher9State = 2
  );

列Enricher3State,Enricher4State,Enricher5State,Enricher9State确实有一个索引,其数据类型为int(11)。

现在我尝试将这些Enricher [x] State更改为字符串:

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = '2' 
    OR enr.Enricher4State = '2' 
    OR enr.Enricher5State = '2' 
    OR enr.Enricher9State = '2'
  ); 

每个常识都会说字符串变体应该执行相同或更慢,因为列的数据类型是整数!

但显然事实并非如此!

使用整数表示法查询(第一个):7.23048825s

使用字符串表示法查询(最后一个):5.22188450s

正如您所看到的,即使两种情况下的查询成本相同,也存在巨大的性能差异。

我完全不知道这种差异是如何发生的 - 如果这意味着,我应该使用字符串表示法更改项目中的所有查询......

我使用的是MySQL 5.7.10版本

根据您的评论,我已停用所有写入或读取数据库并重复实验的服务。

A)整数表示法:

SET profiling=0;
SET profiling=1;

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = 2 
    OR enr.Enricher4State = 2 
    OR enr.Enricher5State = 2 
    OR enr.Enricher9State = 2
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = 2 
    OR enr.Enricher4State = 2 
    OR enr.Enricher5State = 2 
    OR enr.Enricher9State = 2
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = 2 
    OR enr.Enricher4State = 2 
    OR enr.Enricher5State = 2 
    OR enr.Enricher9State = 2
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = 2 
    OR enr.Enricher4State = 2 
    OR enr.Enricher5State = 2 
    OR enr.Enricher9State = 2
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = 2 
    OR enr.Enricher4State = 2 
    OR enr.Enricher5State = 2 
    OR enr.Enricher9State = 2
  );

  SHOW PROFILES;

每个查询的执行时间:

  • 6.42429325
  • 5.95059900
  • 6.34392825
  • 6.53041775
  • 6.69593450

B)字符串表示法:

SET profiling=0;
SET profiling=1;

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = '2' 
    OR enr.Enricher4State = '2' 
    OR enr.Enricher5State = '2' 
    OR enr.Enricher9State = '2'
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = '2' 
    OR enr.Enricher4State = '2' 
    OR enr.Enricher5State = '2' 
    OR enr.Enricher9State = '2'
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = '2' 
    OR enr.Enricher4State = '2' 
    OR enr.Enricher5State = '2' 
    OR enr.Enricher9State = '2'
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = '2' 
    OR enr.Enricher4State = '2' 
    OR enr.Enricher5State = '2' 
    OR enr.Enricher9State = '2'
  );

SELECT 
  COUNT(*) 
FROM 
  incidents.incidents AS inc, 
  incidents.enrichment AS enr 
WHERE 
  inc.Id <= 606734 
  AND inc.Id >= 1 
  AND inc.Id = enr.ParentTableId 
  AND (
    enr.Enricher3State = '2' 
    OR enr.Enricher4State = '2' 
    OR enr.Enricher5State = '2' 
    OR enr.Enricher9State = '2'
  );

  SHOW PROFILES;

执行时间:

  • 5.07188875
  • 4.90356250
  • 4.86164300
  • 4.48403375
  • 5.06533725

正如您可以清楚地看到的,字符串表示法仍然更快!

我的团队的其他开发人员也检测到了同样的行为,所以我可以排除我自己的暂时愚蠢...

4 个答案:

答案 0 :(得分:2)

由于字段已编入索引并且您具有OR条件且查询确实具有整数常量作为条件,因此MySQL可能会花时间进行交叉索引连接计算,然后执行表扫描并使用字符串常量MySQL不执行索引注意事项表扫描。

在OR条件中使用多个字段的索引不是有利的,而是MySQL的额外工作。

OR条件不授予参与字段所需的索引,通常在&#34; 1,2,3,4&#34;上有索引。字段对表不好。这些字段应该单独列出。

已添加:运行EXPLAIN,如果您看到索引&#34; 1,2,3,4&#34;为考虑的密钥列出的字段,这就是MySQL花费时间。

答案 1 :(得分:2)

考虑到Sergiy Tytarenko的回答,我删除了Enricher [x]州专栏的索引。

整数表示法的执行时间:

  • 4.93739900
  • 5.01461550
  • 5.05932075
  • 5.02891175
  • 5.02525075

字符串表示法的执行时间:

  • 5.04365650
  • 5.07545950
  • 5.12358825
  • 5.14665200
  • 5.15426525

现在执行时间大致相同。

的确,当在与OR连接的列上有多个索引时,应该小心。

似乎我不小心发现了一个很好的解决方法(除了删除索引),通过从一个整数中创建一个字符串......

答案 2 :(得分:0)

鉴于您已经多次执行了每个查询,并抛出了第一次执行的结果,我们确实看到了 平均执行时间。

性能差异可能是由于执行计划的差异造成的。

我会仔细查看两个查询中EXPLAIN EXTENDED的输出。很可能执行计划在某种程度上是不同的(正在使用哪些索引,操作顺序等)

我的观察...... MySQL查询优化器和OR条件的查询......查询计划不是最优的。为了获得更好的性能,我通常会使用UNION ALL设置操作来破解查询。

为了获得&#34;计数&#34;,我会倾向于写这样的查询:

  SELECT SUM(2 IN (enr.enricher3state,
                   enr.enricher4state,
                   enr.enricher5state,
                   enr.enricher9state))
    FROM incidents.incidents inc
    JOIN incidents.enrichment enr 
      ON enr.parenttableid = inc.id
   WHERE inc.id <= 606734 
     AND inc.id >= 1

我确定有可用的覆盖索引,例如

ON enrichment (parenttableid, enricher3state, enricher4state,
                              enricher5state, enricher9state)

(或任何带有parenttableid作为前导列的索引,其中还包括其他四列)

然后我会检查EXPLAIN EXTENDED输出和性能。

答案 3 :(得分:0)

与数字比较

char = 123   -- slow because it converts the char to numeric; can't use index
char = '123' -- fine
int = 123    -- fine
int = '123'  -- fine - because '123' is converted to numeric up front

底线:引用常量总是安全的。

OR基本上是不可优化的。但是,以下可能会产生相同的效果,但速度更快......

架构设计中的一般规则:&#34;不要跨列显示事物数组。&#34;相反,创建另一个表并在它们之间建立1:多关系。 可能是性能的最佳解决方案。

请使用 JOIN ... ON ... 语法,而不是&#39; commajoin&#39;。

<强>仿形

5.6.7说&#34; SHOW PROFILE和SHOW PROFILES语句。请改用Performance Schema;请参阅MySQL性能架构。&#34;

<强>索引

索引很少在低基数列上有用,例如我期望的Enricher3State。

IN vs OR

2 IN (...)..=2 OR ..=2 OR... - 这些可能没什么区别。没有索引可以使用;都涉及一些复杂性。

更多信息

需要查看两个表格的SHOW CREATE TABLE 需要查看EXPLAIN SELECT ...