Question

我有一个GTFS数据作为mongodb上的集合。集合的外观与文档中的完全相同：https://developers.google.com/transit/gtfs/reference

我试图退还所有停靠点的交通方式（在我的情况下，是电车，公共汽车或多路）。为此，我必须获取该站点上的所有路线，因为route集合中有一个route_type字段（值0-电车，值3-巴士）。

（伪代码）

Get all stops
Join stoptimes on stops.stop_id = stoptimes.stop_id
Join trips on stoptimes.trip_id = trips.trip_id
Join routes on trips.trip_id = routes.route_id
Compare routes.route_type for each stop according to documentation and return type

在我的GTFS数据中，有一条规则，即每条有轨电车都有一个或两个字符ID，而总线有3个字符ID。这些ID是我的GTFS中的route_id字段。因此，它使我可以省略一个联接：

（伪代码）：

Get all stops
Join stoptimes on stop_id = stoptimes.stop_id
Join trips on stoptimes.trip_id = trips.trip_id
Compare trips.route_id for each stop by length and return type

下面是我的查询。我添加了$limit: 10用于调试。查询需要2028 ms的时间才能停止10次。

收藏集长度：

stops: 6724 documents,
stoptimes: 4823250 documents,
trips: 174698 documents,
routes: 350 documents

问题：

如何使其更快？要停止1000次，执行大约需要75秒...
进行$group时，我要添加$first表达式，不要将最终结果中需要的字段松散。有没有更好的方法将它们包括在最终结果中？

我做了两次$unwind: '$lines'，因为查询创建了一个嵌套数组。如何避免呢？

Stop.aggregate([{
  $match: {
    'stop_id': {
      $regex: /^\d{6}$/
    }
  }
},
{
  $lookup: {
    from: "stoptimes",
    localField: "stop_id",
    foreignField: "stop_id",
    as: "stoptimes"
  }
},
{
  $lookup: {
    from: "trips",
    localField: "stoptimes.trip_id",
    foreignField: "trip_id",
    as: "trips"
  }
},
{
  $limit: 10
},
{
  $group: {
    _id: '$stop_id',
    stop_name: {
      $first: '$stop_name'
    },
    stop_lat: {
      $first: '$stop_lat'
    },
    stop_lon: {
      $first: '$stop_lon'
    },
    lines: {
      $addToSet: "$trips.route_id"
    }
  }
},
{
  $unwind: '$lines'
},
{
  $unwind: '$lines'
},
{
  $project: {
    _id: 1,
    stop_name: 1,
    stop_lat: 1,
    stop_lon: 1,
    lines: {
      $strLenCP: "$lines"
    }
  }
},
{
  $group: {
    _id: {
      stop_id: '$_id',
      stop_name: '$stop_name',
      stop_lat: '$stop_lat',
      stop_lon: '$stop_lon',
      line: '$lines'
    },
    count: {
      $sum: 1
    }
  }
},
{
  $project: {
    _id: 0,
    stop_id: '$_id.stop_id',
    stop_name: '$_id.stop_name',
    stop_lat: '$_id.stop_lat',
    stop_lon: '$_id.stop_lon',
    line: '$_id.line'
  }
},
{
  $group: {
    _id: '$stop_id',
    stop_name: {
      $first: '$stop_name'
    },
    stop_lat: {
      $first: '$stop_lat'
    },
    stop_lon: {
      $first: '$stop_lon'
    },
    lineNameLengths: {
      $addToSet: '$line'
    }
  }
},
{
  $project: {
    _id: 0,
    stop_id: '$_id',
    stop_name: '$stop_name',
    stop_lat: '$stop_lat',
    stop_lon: '$stop_lon',
    type: {
      $switch: {
        branches: [{
            case: {
              $gte: [{
                $size: "$lineNameLengths"
              }, 3]
            },
            then: "multiple"
          },
          {
            case: {
              $and: [{
                  $eq: [{
                    $size: "$lineNameLengths"
                  }, 2]
                },
                {
                  $not: [{
                    $in: [3, '$lineNameLengths']
                  }]
                }
              ]
            },
            then: "tram"
          },
          {
            case: {
              $and: [{
                  $eq: [{
                    $size: "$lineNameLengths"
                  }, 2]
                },
                {
                  $in: [3, '$lineNameLengths']
                }
              ]
            },
            then: "multiple"
          },
          {
            case: {
              $and: [{
                  $eq: [{
                    $size: "$lineNameLengths"
                  }, 1]
                },
                {
                  $in: [1, '$lineNameLengths']
                }
              ]
            },
            then: "tram"
          },
          {
            case: {
              $and: [{
                  $eq: [{
                    $size: "$lineNameLengths"
                  }, 1]
                },
                {
                  $in: [2, '$lineNameLengths']
                }
              ]
            },
            then: "tram"
          },
          {
            case: {
              $and: [{
                  $eq: [{
                    $size: "$lineNameLengths"
                  }, 1]
                },
                {
                  $in: [3, '$lineNameLengths']
                }
              ]
            },
            then: "bus"
          }
        ],
        default: ""
      }
    }
  }
}]);

是否可以使此查询执行得更快？

这是我的第一个mongodb项目，所以希望我可能犯了一些愚蠢的错误，并且可以与经验丰富的同事一起轻松纠正。

多种聚合效果不佳

0 个答案: