Question

我有一张可以自己连接的桌子。我想连接两次。这是架构：

CREATE TABLE `route_connections` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `from_route_iid` int(11) NOT NULL,
    `from_service_id` varchar(100) NOT NULL,
    `to_route_iid` int(11) NOT NULL,
    `to_service_id` varchar(100) NOT NULL,
    PRIMARY KEY (`id`),
    KEY `to_route` (`to_route_iid`),
    KEY `from_route` (`from_route_iid`),
    KEY `to_service` (`to_service_id`),
    KEY `from_service` (`from_service_id`),
    KEY `from_to_route` (`from_route_iid`,`to_route_iid`)
) ENGINE=InnoDB AUTO_INCREMENT=6798783 DEFAULT CHARSET=utf8

它有大约3.7M行。

我的主要目标是找到一条使用3条路线（2条路线连接）的路径，知道允许的出发和到达路线列表（连接路线必须由查询确定）。

路径：路线A→路线B→路线C：

出发路线（已知名单，A）
route_connection c1（A→B）
连接路线（未知，B）
route_connection c2（B→C）
到达路线（已知名单，C）

所以我需要选择三个route_iid：c1.from，c1.to或c2.from（相同）和c2.to。

另外，我需要使用以下过滤器过滤每个service_id：

service_id in (
    select service_id from (
        select service_id from calendar c
            where c.start_date <= 20141109 and end_date >= 20141109 

        union

        select service_id from calendar_dates cd 
            where cd.date = 20141109 and exception_type = 1 
    ) x 
    where x.service_id not in (
        select service_id from calendar_dates cd 
        where cd.date = 20141109 and exception_type = 2
    )
)

首先，我正在处理连接路线而不处理service_id过滤。

当只搜索一个连接时，查询采用＆lt; 1ms（零结果）：

select c.*
from route_connections c
where c.from_route_iid in (864, 865, 495, 494, 459, 54, 458)
    and c.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)

但我的目标是找到2个连接，所以我使用这个查询，这需要花费很多时间（零结果）：

select c1.*, c2.*
from
route_connections c1
inner join route_connections c2 on c2.from_route_iid = c1.to_route_iid
    and c2.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)
where c1.from_route_iid in (864, 865, 495, 494, 459, 54, 458)

过去需要50秒，但我添加了from_to_route索引，将查询加速到18-20秒。

我也尝试过不使用连接：

SELECT ...
FROM route_connections c1, route_connections c2
WHERE ...

但它会产生完全相同的性能（我猜在内部它与连接完全相同）。

我尝试将内部联接更改为左联接+ HAVING子句，但情况要差得多（正如预期的那样）。

我试图删除所有索引但是这两个：

PRIMARY KEY（id），
KEY from_route_iid（from_route_iid，to_route_iid）

结果是相同的，大约18-20s。

以下是解释：

+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+
| id | select_type | table | type  | possible_keys                      | key            | key_len | ref                              | rows  | Extra                            |
+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+
|  1 | SIMPLE      | c1    | range | to_route,from_route,from_route_iid | from_route     | 4       | NULL                             | 15464 | Using index condition; Using MRR |
|  1 | SIMPLE      | c2    | ref   | to_route,from_route,from_route_iid | from_route_iid | 4       | bicou_gtfs_paris.c1.to_route_iid |  1746 | Using index condition            |
+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+

将表格加入自身的正确方法是什么？我错过了索引或其他任何内容吗？

硬件是2014款macbook air，配备1.7GHz Core i7,8GB内存和256GB SSD。该软件是Mac OS X 10.10 Yosemite，带有MySQL 5.6.21

Answer 1

好的，我找到了解决方案：

select to_route_iid
from route_connections
where from_route_iid in (864, 865, 495, 494, 459, 54, 458)

=＆GT; 15471行

select to_route_iid
from route_connections
where from_route_iid in (864, 865, 495, 494, 459, 54, 458)
group by to_route_iid

=＆GT; 97行！

到达路线的情况相同，131个分组行与25427个相同。

所以这个查询：

select c1.from_route_iid, c2.from_route_iid, c2.to_route_iid
from (
    select from_route_iid, to_route_iid
    from route_connections
    where from_route_iid in (864, 865, 495, 494, 459, 54, 458)
    group by to_route_iid
) c1, route_connections c2
where c2.from_route_iid = c1.to_route_iid
and c2.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)
group by c2.from_route_iid, c2.to_route_iid

运行时间为145毫秒。那太好了，今天早上我开始了2分钟:)。

将表连接到自身非常慢

1 个答案: