我有一张可以自己连接的桌子。我想连接两次。这是架构:
CREATE TABLE `route_connections` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`from_route_iid` int(11) NOT NULL,
`from_service_id` varchar(100) NOT NULL,
`to_route_iid` int(11) NOT NULL,
`to_service_id` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
KEY `to_route` (`to_route_iid`),
KEY `from_route` (`from_route_iid`),
KEY `to_service` (`to_service_id`),
KEY `from_service` (`from_service_id`),
KEY `from_to_route` (`from_route_iid`,`to_route_iid`)
) ENGINE=InnoDB AUTO_INCREMENT=6798783 DEFAULT CHARSET=utf8
它有大约3.7M行。
我的主要目标是找到一条使用3条路线(2条路线连接)的路径,知道允许的出发和到达路线列表(连接路线必须由查询确定)。
路径:路线A→路线B→路线C:
route_connection
c1(A→B)route_connection
c2(B→C)所以我需要选择三个route_iid
:c1.from
,c1.to
或c2.from
(相同)和c2.to
。
另外,我需要使用以下过滤器过滤每个service_id
:
service_id in (
select service_id from (
select service_id from calendar c
where c.start_date <= 20141109 and end_date >= 20141109
union
select service_id from calendar_dates cd
where cd.date = 20141109 and exception_type = 1
) x
where x.service_id not in (
select service_id from calendar_dates cd
where cd.date = 20141109 and exception_type = 2
)
)
首先,我正在处理连接路线而不处理service_id
过滤。
当只搜索一个连接时,查询采用&lt; 1ms(零结果):
select c.*
from route_connections c
where c.from_route_iid in (864, 865, 495, 494, 459, 54, 458)
and c.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)
但我的目标是找到2个连接,所以我使用这个查询,这需要花费很多时间(零结果):
select c1.*, c2.*
from
route_connections c1
inner join route_connections c2 on c2.from_route_iid = c1.to_route_iid
and c2.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)
where c1.from_route_iid in (864, 865, 495, 494, 459, 54, 458)
过去需要50秒,但我添加了from_to_route
索引,将查询加速到18-20秒。
我也尝试过不使用连接:
SELECT ...
FROM route_connections c1, route_connections c2
WHERE ...
但它会产生完全相同的性能(我猜在内部它与连接完全相同)。
我尝试将内部联接更改为左联接+ HAVING
子句,但情况要差得多(正如预期的那样)。
我试图删除所有索引但是这两个:
id
),from_route_iid
(from_route_iid
,to_route_iid
)结果是相同的,大约18-20s。
以下是解释:
+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+
| 1 | SIMPLE | c1 | range | to_route,from_route,from_route_iid | from_route | 4 | NULL | 15464 | Using index condition; Using MRR |
| 1 | SIMPLE | c2 | ref | to_route,from_route,from_route_iid | from_route_iid | 4 | bicou_gtfs_paris.c1.to_route_iid | 1746 | Using index condition |
+----+-------------+-------+-------+------------------------------------+----------------+---------+----------------------------------+-------+----------------------------------+
将表格加入自身的正确方法是什么?我错过了索引或其他任何内容吗?
硬件是2014款macbook air,配备1.7GHz Core i7,8GB内存和256GB SSD。 该软件是Mac OS X 10.10 Yosemite,带有MySQL 5.6.21
答案 0 :(得分:1)
好的,我找到了解决方案:
select to_route_iid
from route_connections
where from_route_iid in (864, 865, 495, 494, 459, 54, 458)
=&GT; 15471行
select to_route_iid
from route_connections
where from_route_iid in (864, 865, 495, 494, 459, 54, 458)
group by to_route_iid
=&GT; 97行!
到达路线的情况相同,131个分组行与25427个相同。
所以这个查询:
select c1.from_route_iid, c2.from_route_iid, c2.to_route_iid
from (
select from_route_iid, to_route_iid
from route_connections
where from_route_iid in (864, 865, 495, 494, 459, 54, 458)
group by to_route_iid
) c1, route_connections c2
where c2.from_route_iid = c1.to_route_iid
and c2.to_route_iid in (745, 744, 1096, 1093, 743, 317, 742, 13, 316)
group by c2.from_route_iid, c2.to_route_iid
运行时间为145毫秒。那太好了,今天早上我开始了2分钟:)。