Question

将3个索引定义为：

唯一unique_ID（refDate，instrument）

refDate（refDate）

仪器（仪器）

现在

行大约有1000万行，但对于每个refDate，现在只有大约5000种不同的工具

我有一个自我加入此表的查询，以生成如下输出： refDate | rate instrument = X |率工具= Y | rate instrument = Z | ....

基本上返回时间序列数据，然后我可以自己进行分析。

问题出在这里：我的原始查询如下：

Select distinct AUDSpot1yFq.refDate,AUDSpot1yFq.rate as 'AUDSpot1yFq',
AUD1y1yFq.rate as AUD1y1yFq
from audratedb AUDSpot1yFq inner join audratedb AUD1y1yFq on
AUDSpot1yFq.refDate=AUD1y1yFq.refDate 
where AUDSpot1yFq.instrument = 'AUDSpot1yFq' and 
AUD1y1yFq.instrument = 'AUD1y1yFq' 
order by AUDSpot1yFq.refDate

请注意，在下面这个特定的时间查询中，我实际上得到了10种不同的工具，这意味着查询的时间要长得多，但遵循相同的命名，内连接和where语句模式。

这很慢，在工作台上我将它计时为7-8秒（但是接近0个获取时间，因为我在运行服务器的机器上有工作台）。当我剥离了明显的，持续时间下降到0.25-0.5秒（更易于管理），当我剥离＆＃34;顺序时，＃34;它变得更快（<0.1秒，此时我不在乎）。但是我的Fetchtime爆炸了大约7秒钟。所以总的来说，我什么也得不到，但它已成为一个获取时间问题。当我从将要进行提升和工作的python脚本运行此查询时，无论是否包含distinct，我都会得到大致相同的时间。

当我在我的减少查询（具有可怕的获取时间）上运行解释时，我得到：

1   SIMPLE  AUDSpot1yFq     ref unique_ID,refDate,instrument    instrument  39  const   1432    100.00  Using where
1   SIMPLE  AUD1y1yFq       ref unique_ID,refDate,instrument    unique_ID   42  historicalratesdb.AUDSpot1yFq.refDate,const 1   100.00  Using where
1   SIMPLE  AUD2y1yFq       ref unique_ID,refDate,instrument    unique_ID   42  historicalratesdb.AUDSpot1yFq.refDate,const 1   100.00  Using where
1   SIMPLE  AUD3y1yFq       ref unique_ID,refDate,instrument    unique_ID   42  historicalratesdb.AUDSpot1yFq.refDate,const 1   100.00  Using where
1   SIMPLE  AUD4y1yFq       ref unique_ID,refDate,instrument    unique_ID   42  historicalratesdb.AUDSpot1yFq.refDate,const 1   100.00  Using where
1   SIMPLE  AUD5y1yFq       ref unique_ID,refDate,instrument    unique_ID   42  historicalratesdb.AUDSpot1yFq.refDate,const 1   100.00  Using where
1   SIMPLE  AUD6y1yFq       ref unique_ID,refDate,instrument    unique_ID   42  historicalratesdb.AUDSpot1yFq.refDate,const 1   100.00  Using where
1   SIMPLE  AUD7y1yFq       ref unique_ID,refDate,instrument    unique_ID   42  historicalratesdb.AUDSpot1yFq.refDate,const 1   100.00  Using where
1   SIMPLE  AUD8y1yFq       ref unique_ID,refDate,instrument    unique_ID   42  historicalratesdb.AUDSpot1yFq.refDate,const 1   100.00  Using where
1   SIMPLE  AUD9y1yFq       ref unique_ID,refDate,instrument    unique_ID   42  historicalratesdb.AUDSpot1yFq.refDate,const 1   100.00  Using where

我现在意识到distinct不是必需的，当我将输出输出到数据帧时，order by是我可以丢弃并在pandas中排序的东西。那样太好了。但我不知道如何让Fetch时间缩短。我不打算在这个网站上赢得任何能力竞赛，但我已经尽可能多地搜索，无法找到解决这个问题的方法。非常感谢任何帮助。

〜可可

Answer 1

问题没有提及现有索引，或者显示EXPLAIN对任何查询的输出。

提高性能的快速答案是添加索引：

   ... ON audratedb (instrument,refdate,rate)

为了回答我们为什么要添加该索引，我们需要了解MySQL如何处理SQL语句，可能的操作以及需要哪些操作。要查看MySQL实际处理语句的方式，您需要使用EXPLAIN查看查询计划。

Answer 2

（我必须简化表别名以便阅读它：）

Select  distinct
           s.refDate,
           s.rate as AUDSpot1yFq,
           y.rate as AUD1y1yFq
    from  audratedb AS s
    join  audratedb AS y  on s.refDate = y.refDate
    where  s.instrument = 'AUDSpot1yFq'
      and  y.instrument = 'AUD1y1yFq'
    order by  s.refDate

需要索引：

INDEX(instrument, refDate)  -- To filter and sort, or
INDEX(instrument, refDate, rate)  -- to also "cover" the query.

假定查询并不比你说的复杂。我看到EXPLAIN已经有更多的表了。请提供SHOW CREATE TABLE audratedb和整个SELECT。

回到你的问题......

DISTINCT有以下两种方式之一：（1）对表进行排序，然后进行重复数据删除，或（2）在内存中的哈希中进行重复数据删除。请记住，您正在扣除所有3列（refDate，s.rate，y.rate）。

收集完所有数据后，

ORDER BY是一种排序。但是，使用建议的索引（不是您拥有的索引），不需要排序，因为索引将按所需顺序获取行。

但是...... DISTINCT和ORDER BY的可能会使优化程序混淆到它做某事的地步......

您说(refDate,instrument)是UNIQUE，但您没有提到PRIMARY KEY，也没有提到您正在使用的引擎。如果您使用InnoDB，那么PRIMARY KEY(instrument, refDate)，按此顺序，将进一步加快速度，并避免需要任何新索引。

此外，拥有(a,b)和(a)也是多余的。也就是说，您当前的架构不需要INDEX(refDate)，但通过更改PK，您将不需要INDEX(instrument)。

底线：仅

PRIMARY KEY(instrument, refDate), INDEX(refDate)

并且没有其他索引（除非您可以显示一些需要它的查询）。

有关EXPLAIN的更多信息。注意Rows列如何表示1432,1,1，......这意味着它扫描了第一个表的估计1432行。由于缺乏适当的指数，这可能远远超过必要。然后它需要在每个其他表中只查看一行。（不能比那更好。）

SELECT 中没有 DISTINCT或ORDER BY的行数是多少？这告诉你在提取和JOINing之后需要做多少工作。我怀疑它只是少数几个。 A＆＃34;少数＆＃34; DISTINCT和ORDER BY非常便宜;因此我觉得你正在吠叫错误的树。甚至1432行的处理速度也非常快。

至于buffer_pool ...桌子有多大？做SHOW TABLE STATUS。我怀疑该表超过1GB，因此它无法容纳在buffer_pool中。因此，提高缓存大小将使查询在RAM中运行，而不是访问磁盘（至少在缓存之后）。请记住，在冷缓存上运行查询会有很多I / O.随着缓存预热，查询将运行得更快。但如果缓存太小，您将继续需要I / O. I / O是处理过程中最慢的部分。

我希望你有至少6GB的内存;否则，2G可能会非常危险。交换对性能来说真的很糟糕。

MYSQL缓慢持续时间或获取时间取决于＆＃34; distinct＆＃34;命令

2 个答案: