将表连接到自身非常慢

时间:2014-11-09 11:01:44

标签: mysql performance join

我有一张可以自己连接的桌子。我想连接两次。这是架构:

CREATE TABLE `route_connections` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `from_route_iid` int(11) NOT NULL,
    `from_service_id` varchar(100) NOT NULL,
    `to_route_iid` int(11) NOT NULL,
    `to_service_id` varchar(100) NOT NULL,
    PRIMARY KEY (`id`),
    KEY `to_route` (`to_route_iid`),
    KEY `from_route` (`from_route_iid`),
    KEY `to_service` (`to_service_id`),
    KEY `from_service` (`from_service_id`),
    KEY `from_to_route` (`from_route_iid`,`to_route_iid`)
) ENGINE=InnoDB AUTO_INCREMENT=6798783 DEFAULT CHARSET=utf8

它有大约3.7M行。

我的主要目标是找到一条使用3条路线(2条路线连接)的路径,知道允许的出发和到达路线列表(连接路线必须由查询确定)。

路径:路线A→路线B→路线C:

  • 出发路线(已知名单,A)
  • route_connection c1(A→B)
  • 连接路线(未知,B)
  • route_connection c2(B→C)
  • 到达路线(已知名单,C)

所以我需要选择三个route_iidc1.fromc1.toc2.from(相同)和c2.to

另外,我需要使用以下过滤器过滤每个service_id

service_id in (
    select service_id from (
        select service_id from calendar c
            where c.start_date <= 20141109 and end_date >= 20141109 

        union

        select service_id from calendar_dates cd 
            where cd.date = 20141109 and exception_type = 1 
    ) x 
    where x.service_id not in (
        select service_id from calendar_dates cd 
        where cd.date = 20141109 and exception_type = 2
    )
)

首先,我正在处理连接路线而不处理service_id过滤。

当只搜索一个连接时,查询采用&lt; 1ms(零结果):

select c.*
from route_connections c
where c.from_route_iid in (864, 865, 495, 494, 459, 54, 458)
    and c.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)

但我的目标是找到2个连接,所以我使用这个查询,这需要花费很多时间(零结果):

select c1.*, c2.*
from
route_connections c1
inner join route_connections c2 on c2.from_route_iid = c1.to_route_iid
    and c2.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)
where c1.from_route_iid in (864, 865, 495, 494, 459, 54, 458)

过去需要50秒,但我添加了from_to_route索引,将查询加速到18-20秒。

我也尝试过不使用连接:

SELECT ...
FROM route_connections c1, route_connections c2
WHERE ...

但它会产生完全相同的性能(我猜在内部它与连接完全相同)。

我尝试将内部联接更改为左联接+ HAVING子句,但情况要差得多(正如预期的那样)。

我试图删除所有索引但是这两个:

  • PRIMARY KEY(id),
  • KEY from_route_iidfrom_route_iidto_route_iid

结果是相同的,大约18-20s。

以下是解释:

+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+
| id | select_type | table | type  | possible_keys                      | key            | key_len | ref                              | rows  | Extra                            |
+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+
|  1 | SIMPLE      | c1    | range | to_route,from_route,from_route_iid | from_route     | 4       | NULL                             | 15464 | Using index condition; Using MRR |
|  1 | SIMPLE      | c2    | ref   | to_route,from_route,from_route_iid | from_route_iid | 4       | bicou_gtfs_paris.c1.to_route_iid |  1746 | Using index condition            |
+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+

将表格加入自身的正确方法是什么?我错过了索引或其他任何内容吗?

硬件是2014款macbook air,配备1.7GHz Core i7,8GB内存和256GB SSD。 该软件是Mac OS X 10.10 Yosemite,带有MySQL 5.6.21

1 个答案:

答案 0 :(得分:1)

好的,我找到了解决方案:

select to_route_iid
from route_connections
where from_route_iid in (864, 865, 495, 494, 459, 54, 458)

=&GT; 15471行

select to_route_iid
from route_connections
where from_route_iid in (864, 865, 495, 494, 459, 54, 458)
group by to_route_iid

=&GT; 97行!

到达路线的情况相同,131个分组行与25427个相同。

所以这个查询:

select c1.from_route_iid, c2.from_route_iid, c2.to_route_iid
from (
    select from_route_iid, to_route_iid
    from route_connections
    where from_route_iid in (864, 865, 495, 494, 459, 54, 458)
    group by to_route_iid
) c1, route_connections c2
where c2.from_route_iid = c1.to_route_iid
and c2.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)
group by c2.from_route_iid, c2.to_route_iid

运行时间为145毫秒。那太好了,今天早上我开始了2分钟:)。