Question

我有一个巨大的收藏品只包含这种文件。

  {
        "_id" : "https://example.com/test.html",
        "Count" : 1503.0000000000000000
    }, 
    {
        "_id" : "http://example.org/gr/",
        "Count" : 715.0000000000000000
    }, 
    {
        "_id" : "https://example.com/document/d//edit",
        "Count" : 710.0000000000000000
    }, 
    {
        "_id" : "http://example.org/gr/test.htm",
        "Count" : 429.0000000000000000
    }
}

如何使用mongodb聚合框架来实现此结果。

 {
        "_id" : "https://example.com/",
        "Count" : 2213.0000000000000000
    }, 
    {
        "_id" : "http://example.org/",
        "Count" : 1144.0000000000000000
    }
}

Specificaly如何在$ project管道中拆分后使用文本搜索？

提前致谢!!

Answer 1

首先，您必须使用$substr来检索每个URI的开头。

然后你应该可以$group和$sum了。

第一部分可能会变得棘手和/或不可能，因为我不知道任何运算符在字符串中返回第三个斜杠的位置。

虽然我的建议是重写代码，但是在插入之前分割字符串。即。

{
  id: ObjectId("..."),
  domain: "http://example.com",
  path: "test.html",
  count: 1503
}

当子域名也应该可访问时，我会去或类似：

{
  id: ObjectId("..."),
  uri: "http://sub.example.org/foo.html",
  protocol: "http",
  subdomain: "sub",
  domain: "example.org",
  path: "foo.html",
  count: 1503
}

这当然可能在插入时较慢，但你可以查询很多东西。

使用Aggregation框架汇总MongoDb中类似条目的总和

1 个答案: