Question

我想从数据库中的集合中检索字段的分区值。 distinct命令是显而易见的解决方案。问题是某些字段具有大量可能的值并且不是简单的原始值（即，是复杂的子文档而不是仅仅是字符串）。这意味着结果很大，导致客户端超载，我将结果传递给。

显而易见的解决方案是对生成的不同值进行分页。但我找不到最佳方法来做到这一点。由于distinct没有分页选项（限制，跳过等），我转向聚合框架。我的基本管道是：

[
  {$match: {... the documents I am interested in ...}},
  {$group: {_id: '$myfield'},
  {$sort: {_id: 1},
  {$limit: 10},
]

这为myfield提供了前10个唯一值。为了获得下一页，将向管道中添加$ skip运算符。所以：

[
  {$match: {... the documents I am interested in ...}},
  {$group: {_id: '$myfield'},
  {$sort: {_id: 1},
  {$skip: 10},
  {$limit: 10},
]

但有时候我从中收集唯一值的字段是一个数组。这意味着我必须在分组之前解开它。所以：

[
  {$match: {... the documents I am interested in ...}},
  {$unwind: '$myfield'}
  {$group: {_id: '$myfield'},
  {$sort: {_id: 1},
  {$skip: 10},
  {$limit: 10},
]

其他时候，我获取唯一值的字段可能不是数组，但它的父节点可能是一个数组。所以：

[
  {$match: {... the documents I am interested in ...}},
  {$unwind: '$list'}
  {$group: {_id: '$list.myfield'},
  {$sort: {_id: 1},
  {$skip: 10},
  {$limit: 10},
]

最后有时我还需要对字段内的数据进行过滤我得到了不同的值。这意味着在展开后我有时需要另一个匹配算子：

[
  {$match: {... the documents I am interested in ...}},
  {$unwind: '$list'}
  {$match: {... filter within list.myfield ...}},
  {$group: {_id: '$list.myfield'},
  {$sort: {_id: 1},
  {$skip: 10},
  {$limit: 10},
]

以上都是我实际数据的简化。以下是来自应用程序的真实管道：

[
  {"$match": {
    "source_name":"fmls",
    "data.surroundings.school.nearby.district.id": 1300120,
    "compiled_at":{"$exists":true},
    "data.status.current":{"$ne":"Sold"}
  }},
  {"$unwind":"$data.surroundings.school.nearby"},
  {"$match": { 
    "source_name":"fmls",
    "data.surroundings.school.nearby.district.id":1300120,
    "compiled_at":{"$exists":true},
    "data.status.current":{"$ne":"Sold"}
  }},
  {"$group":{"_id":"$data.surroundings.school.nearby"}},
  {"$sort":{"_id":1}},
  {"$skip":10},
  {"$limit":10}
]

我在展开后将同一个$match文档同时发送到初始过滤器和过滤器，因为$match文档与第三方有些不透明，所以我真的不知道哪个部分是查询在我的展开之外进行过滤vs在数据中我正在获得不同的值。

有没有明显不同的方法可以解决这个问题。一般来说，我的策略是有效的，但有一些查询需要10-15秒才能返回结果。集合中大约有200,000个文档，但在管道中的第一个$match之后只有大约60,000个文档（可以使用索引）。

在Mongo中分离不同的价值观

0 个答案: