Mongo根据具有计数的唯一子集合属性聚合搜索结果?

时间:2019-04-11 15:34:06

标签: node.js mongodb mongoose aggregation-framework

许多小时以来,我一直在努力地思考如何做,我有一个名为“工作”的集合-内部,有一个子集合“站点”,即Jobs.site。该网站子集合的属性为“ UNID”。

我正在尝试基于文本搜索从数据库中检索文档,效果很好。

但是我试图仅检索基于该Job.Site.UNID的UNIQUE文档,并可能添加了一个计数作为额外属性。结果应如下所示:

Job: { site: { field1: 'EXAMPLE', UNID: 'SITEID', count: 5 } }

这意味着作业集中有5个具有该site.UNID的作业。

这是我到目前为止所拥有的:

[
// GETTING DOCS BASED ON TEXT SEARCH RESULTS
    {
      $match: {
        // clientId: req.user.client_id,
        $text: { $search: body.searchTerms }
      }
    },
// SORTING THEM BASED ON TEXTSCORE
    { $sort: { score: { $meta: 'textScore' } } },
// THE PROBLEMATIC GROUPING PART
    { $group: { site: { UPRN: '$UPRN', myCount: { $sum: 1 } } } },
// I ONYL WANT TO GET 20 DOCS AT A TIME
    { $limit: 20 },
// THE DATA THAT I WANT IN MY DOCUMENTS, MAYBE COUNT WOULD COME HERE?
    {
      $project: {
        site: true,
        score: { $meta: 'textScore' }
      }
    },
// GETTING RID OF POOR MATCHES BASED ON A SCORE CALCULATED IN ANOTHER 
// FUNCTION BASED ON THE NUMBER OF WORDS IN THE TEXT SEARCH
    {
      $match: {
        score: { $gt: matchScore }
      }
    }
  ]

这里用The field 'site' must be an accumulator object打我

因此,我无法弄清楚该子集合属性可以正常使用的语法。

编辑:由于@Anthony的出色表现,V2进行了出色的测试,并对其进行了全面的测试,只是它似乎不计算工作总数,它始终为1或我在$ sum中设置的值:但是有200多个结果,仍然可以使用在上面。

 { $match: { $text: { $search: body.searchTerms } } },
    { $sort: { $score: { $meta: 'textScore' } } },
    // { $match: { score: { $gt: 0.1 } } },
    {
      $group: {
        _id: '$UNID',
        counter: { $sum: 1 },
        score: { $first: { $meta: 'textScore' } },
        title: { $first: '$title' },
        postcode: { $first: '$postcode' },
        addressLine1: { $first: '$addressLine1' },
        city: { $first: '$city' },
        projectName: { $first: '$projectName' },
        jobsCount: { $sum: '$counter' }
      }
    },
    { $limit: 20 },
    {
      $project: {
        UNID: '$_id',
        title: '$title',
        postcode: '$postcode',
        addressLine1: '$addressLine1',
        projectName: '$projectName',
        city: '$city',
        score: 1,
        jobsCount: true
      }
    }

示例数据:


{
  "_id": "randomString0",
  "title": "Quality",
  "site": {
    "_id": "rKFRbvH8CEbJYdzDs",
    "title": "Title 1",
    "addressLine1": "address1",
    "UNID": "001",
    "city": "cityName",
    "createdAt": null
  }
},
{
  "_id": "randomString1",
  "title": "Some2123",
  "site": {
    "_id": "rKFRbvH8CEbJYdzDs",
    "title": "Title 1",
    "addressLine1": "address1",
    "UNID": "001",
    "city": "cityName",
    "createdAt": null
  }
},
{
  "_id": "randomString2",
  "title": "Random title",
  "site": {
    "_id": "rKFRbvH8CEbJYdzDs",
    "title": "Title 1",
    "addressLine1": "address1",
    "UNID": "001",
    "city": "cityName",
    "createdAt": null
  }
},
{
  "_id": "randomString3",
  "title": "Another unique job",
  "site": {
    "_id": "rKFRbvH8CEbJYdzDs",
    "title": "Title 1",
    "addressLine1": "address1",
    "UNID": "001",
    "city": "cityName",
    "createdAt": null
  }
},
{
  "_id": "randomString4",
  "title": "Other thing",
  "site": {
    "_id": "rKFRbvH8CEbJYdzDs",
    "title": "Title 1",
    "addressLine1": "address1",
    "UNID": "001",
    "city": "cityName",
    "createdAt": null
  }
},
{
  "_id": "randomString5",
  "title": "Something else",
  "site": {
    "_id": "rKFRbvH8CEbJYdzDs",
    "title": "Title 1",
    "addressLine1": "address1",
    "UNID": "001",
    "city": "cityName",
    "createdAt": null
  }
}

如您所见,站点数据在所有这5个文档中始终是唯一的,但计数器应计算出多少个文档具有相同的唯一性

1 个答案:

答案 0 :(得分:1)

$group阶段,_id(您要分组到的)表达式是必需的表达式。而且accumulators聚合阶段中只能使用少数$group

所以您的聚合必须是这样的

[
  { "$match": { "$text": { "$search": body.searchTerms }}},
  { "$sort": { "score": { "$meta": "textScore" } } },
  { "$match": { "score": { "$gt": matchScore }}},
  { "$group": {
    "_id": "$UPRN",
    "myCount": { "$sum": 1 },
    "score": { "$first": "$score" }
  }},
  { "$limit": 20 },
  { "$project": {
    "site": "$_id",
    "score": 1,
    "myCount": 1
  }}
]