Elasticsearch:如何获取带有父字段的子嵌套对象的计数?

时间:2019-01-21 09:52:17

标签: elasticsearch querydsl elasticsearch-query

我有一个方案可以从弹性搜索中检索数百万条记录。

我是Elastic-search的初学者,不能非常有效地使用弹性搜索。

我在弹性搜索中索引作者模型,如下所示,并且我正在使用NEST Client在.net应用程序中使用弹性搜索。

下面我要解释我的模型。

Author
--------------------------------
AuthorKey           string
List<Study>         Nested


Study
---------------------------------
PMID              int
PublicationDate   date
PublicationType   string
MeshTerms         string
Content           string

我们有将近1000万的作者,每个作者至少完成了3项研究。

因此,弹性索引中大约有3000万条记录可用。

现在我想获得作者的数据及其研究总数

下面是示例JSON数据:

{
  "Authors": [
    {
      "AuthorKey": "Author1",
      "AuthorName": "karan",
      "AuthorLastName": "shah",
      "Study": [
        {
          "PMId": 1000,
          "PublicationDate": "2019-01-17T06:35:52.178Z",
          "content": "this is dummy content.how can i solve this",
          "MeshTerms": "karan,dharan,nilesh,manan,mehul sir,manoj",
          "PublicationType": [
            "ClinicalTrial",
            "Medical"
          ]
        },
        {
          "PMId": 1001,
          "PublicationDate": "2019-01-16T05:55:14.947Z",
          "content": "this is dummy content.how can i solve this",
          "MeshTerms": "karan1,dharan1,nilesh1,manan1,mehul1 sir,manoj1",
          "PublicationType": [
            "ClinicalTrial",
            "Medical"
          ]
        },
        {
          "PMId": 1002,
          "PublicationDate": "2019-01-15T05:55:14.947Z",
          "content": "this is dummy content for record2.how can i solve 
           this",
          "MeshTerms": "karan2,dharan2,nilesh2,manan2,mehul2 sir,manoj2",
          "PublicationType": [
            "ClinicalTrial1",
            "Medical2"
          ]
        },
        {
          "PMId": 1003,
          "PublicationDate": "2011-01-15T05:55:14.947Z",
          "content": "this is dummy content for record3.how can i solve this",
          "MeshTerms": "karan3,dharan3,nilesh3,manan3,mehul3 sir,manoj3",
          "PublicationType": [
            "ClinicalTrial1",
            "Medical3"
          ]
        }
      ]
    },
    {
      "AuthorKey": "Author2",
      "AuthorName": "dharan",
      "AuthorLastName": "shah",
      "Study": [

        {
          "PMId": 2001,
          "PublicationDate": "2011-01-16T05:55:14.947Z",
          "content": "this is dummy content for author 2.how can i solve 
           this",
          "MeshTerms": "karan1,dharan1,nilesh1,manan1,mehul1 sir,manoj1",
          "PublicationType": [
            "ClinicalTrial",
            "Medical"
          ]
        },
        {
          "PMId": 2002,
          "PublicationDate": "2019-01-15T05:55:14.947Z",
          "content": "this is dummy content for author 2.how can i solve 
           this",
          "MeshTerms": "karan2,dharan2,nilesh2,manan2,mehul2 sir,manoj2",
          "PublicationType": [
            "ClinicalTrial1",
            "Medical2"
          ]
        },
        {
          "PMId": 2003,
          "PublicationDate": "2015-01-15T05:55:14.947Z",
          "content": "this is dummy content for record2.how can i solve 
           this",
          "MeshTerms": "karan3,dharan3,nilesh3,manan3,mehul3 sir,manoj3",
          "PublicationType": [
            "ClinicalTrial1",
            "Medical3"
          ]
        }
      ]
    },
    {
      "AuthorKey": "Author3",
      "AuthorName": "Nilesh",
      "AuthorLastName": "Mistrey",
      "Study": [
        {
          "PMId": 3000,
          "PublicationDate": "2012-01-16T05:55:14.947Z",
          "content": "this is dummy content for author 2 .how can i solve 
           this",
          "MeshTerms": "karan2,dharan2,nilesh2,manan2,mehul sir2,manoj2",
          "PublicationType": [
            "ClinicalTrial",
            "Medical"
          ]
        }

  ]
}

如何检索所有作者及其研究总数从高到低?

预期输出:

{
  "Authors": [
    {
      "AuthorKey": "Author1",
      "AuthorName": "karan",
      "AuthorLastName": "shah",
      "StudyCount": 4
    },
    {
      "AuthorKey": "Author2",
      "AuthorName": "dharan",
      "AuthorLastName": "shah",
      "StudyCount": 3
    },

    {
      "AuthorKey": "Author3",
      "AuthorName": "Nilesh",
      "AuthorLastName": "Mistrey",
      "StudyCount": 1
    }
  ]
}

以下是索引的映射:

{
  "authorindex": {
    "mappings": {
      "_doc": {
        "properties": {
          "AuthorKey": {
            "type": "keyword"
          },
          "AuthorLastName": {
            "type": "keyword"
          },
          "AuthorName": {
            "type": "keyword"
          },
          "Study": {
            "type": "nested",
            "properties": {
              "MeshTerms": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "PMId": {
                "type": "long"
              },
              "PublicationDate": {
                "type": "date"
              },
              "PublicationType": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "content": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

1 个答案:

答案 0 :(得分:0)

有两种解决方法。

    在此answer中,对类似的问题建议
  1. 使用类似脚本的脚本;

  2. 预先计算所需的研究次数,将其作为简单整数存储在索引中,然后对结果进行排序。

根据您所面临的情况,这两种选择都可以为您服务。

如果您需要试验数据并进行随意查询,则选项1)将起作用。它性能不佳,但应与现有数据和映射一起使用。

选项2)相反,将需要完全重新索引并在将数据发送到Elasticsearch之前添加一个额外的步骤(至今仍很容易)。从积极的方面来说,这将保证最佳的性能。

您可以在《权威指南》的Handling relationships一章中了解有关Elasticsearch中处理关系的其他方式的信息。

希望有帮助!