Question

我有三个数据库表：routes，trips和stoptimes，其中包含公交信息。它们与外键相关如下：

         routes -> ROUTE_ID -> trips -> TRIP_ID -> stoptimes

即。有一些路线，每条路线有很多次旅行，每次旅行的停留次数更多。

对于表格中的每条路线，我想选择停留次数最多的旅程。

此外，每条路线都有一个枚举（INT）direction_id，我想为每条路线选择每个方向停留时间最多的行程。

这是一些数据预处理的全部内容，我们的想法是这些选定的行程将设置一个标志，以便将来可以轻松召回。

是否可以在SQL中实现此目的？

编辑：

根据要求提供更多信息。以下是SELECT查询/结果表示例：

select t.route_id, t.direction_id, t.trip_id, NumStops, t.isPrototypical
from trips t join
     (select st.trip_id, count(*) as NumStops
      from stoptimes st
      group by st.trip_id
     ) st
     on st.trip_id = t.trip_id;

结果：

sample sql results table

在上面的例子中，我想要一个选择行程2和10的SQL语句，因为它们在每个方向上都有（等于）最大的NumStop。如果SQL语句SELECTING可以UPDATE列isPrototypical到TRUE那些特定行，那就更好了。

请记住：在生产数据库中，每次旅行时会有多个route_id和任意数量的direction_id s。该声明需要为每个方向和每条路线发挥其魔力。

最终答案

下面的Gordon Linoff提供了一个正确，性能良好的解决方案，我想我也会发布他用来解决问题的代码的修改版本。

这是用于选择和更新每个路线，每个路线最多停靠次数的旅行的SQL，而只是在出现平局时选择一次旅行：

update trips t join  ( select substring_index(group_concat(t.trip_id order by NumStops desc), ',', 1) as prototripid from trips t join
     (select st.trip_id, count(*) as NumStops
      from stoptimes st
      group by st.trip_id
     ) st
     on st.trip_id = t.trip_id group by t.route_id, t.direction_id ) t2 on t2.prototripid = t.trip_id set isPrototypical = 1 ;

我认为这可能是特定于MySQL的。

Answer 1

你可以在MySQL中使用一个技巧，包括组连接。

以下是查询：

select t.route_id,
       substring_index(group_concat(t.trip_id order by NumStops desc), ',', 1),
       max(NumStops) as Length
from trips t join
     (select st.trip_id, count(*) as NumStops
      from stoptimes st
      group by st.trip_id
     ) st
     on st.trip_id = t.trip_id
group by t.route_id;

（除非您需要路线名称，否则不需要routes表。）

子查询计算每次旅行的停靠次数。然后由route_id汇总。

通常，group_concat()将用于将所有行程放在以逗号分隔的字符串中。在这里它是这样做的，但需要注意的是它们是以最长的第一个停靠点数排序的。然后函数substring_index()取第一个值。

这会将trip_id转换为字符串。您可能希望将其转换回它开始的任何数据类型。

以下各方面的效果最佳：

select t.route_id, t.direction_id,
       substring_index(group_concat(t.trip_id order by NumStops desc), ',', 1),
       max(NumStops) as Length
from trips t join
     (select st.trip_id, count(*) as NumStops
      from stoptimes st
      group by st.trip_id
     ) st
     on st.trip_id = t.trip_id
group by t.route_id, t.direction_id;

因为方向存储在 trip 级别，所以它不会干扰行程中的停止计数（也就是说，{{1}似乎不需要子查询。

Answer 2

如果您正确地将所有表格合并在一起，则每个停靠时间都会有一行，因此COUNT(*)将为您提供总停靠时间。

对于按方向计数，我假设方向值为1, 2, 3, ...。我无法分辨哪个表direction_id所在，所以我在查询中没有使用它：

SELECT routes.Route_ID
   COUNT(*) AS TotalStops,
   COUNT(CASE WHEN direction_id = 1 THEN 1 END) AS Direction1Stops,
   COUNT(CASE WHEN direction_id = 2 THEN 1 END) AS Direction2Stops,
   COUNT(CASE WHEN direction_id = 3 THEN 1 END) AS Direction3Stops,
   ... and the remaining direction_id values
FROM routes
INNER JOIN trips ON routes.Route_ID = trips.Route_ID
INNER JOIN stoptimes on trips.Trip_ID = stoptimes.Trip_ID
GROUP BY routes.Route_ID

Answer 3

虽然我确信有更优雅的方法可以做到这一点，但我们的理念是使用MAX和GROUP BY将结果加入到自身中。如果MySQL支持公用表表达式，这看起来不会那么糟糕：

update trips t
  join (
    select t.route_id, t.direction_id, t.trip_id, NumStops, t.isPrototypical
    from trips t join
         (select st.trip_id, count(*) as NumStops
          from stoptimes st
          group by st.trip_id
         ) st
         on st.trip_id = t.trip_id
    ) t2 on t.trip_id = t2.trip_id
  join (
    select max(numstops) maxnumstops, route_id, direction_id
    from (
      select t.route_id, t.direction_id, t.trip_id, NumStops, t.isPrototypical
      from trips t join
         (select st.trip_id, count(*) as NumStops
          from stoptimes st
          group by st.trip_id
         ) st
         on st.trip_id = t.trip_id
      ) t
    group by route_id, direction_id
    ) t3 on t2.numstops = t3.maxnumstops and t2.route_id = t3.route_id and t2.direction_id = t3.direction_id
set t.isPrototypical = 1;

SQL Fiddle Demo

SQL帮助 - 选择具有最大相关行数的表

3 个答案: