在Elasticsearch中计算子页面

时间:2016-04-16 16:59:06

标签: elasticsearch

我的索引test包含以下文档:

POST /test/page/a
{
  "Id": "a",
  "Parent": "0"
}

POST /test/page/b
{
  "Id": "b",
  "Parent": "a"
}

POST /test/page/c
{
  "Id": "c",
  "Parent": "a"
}

POST /test/page/d
{
  "Id": "d",
  "Parent": "c"
}

即,在逻辑page层次结构中,如下所示:

0 (non existant)
|
`- a
   |
   > b
   |
   ` c
     |
     ` d

我可以找到page等于Parent的所有a。我只是:

POST /test/page/_search
{
  "query": {
    "term": {
      "Parent": "a"
    }
  }
}

答案(缩写):

{
  "hits": {
    "total": 2,
    "hits": [
      {
        "_index": "test",
        "_type": "page",
        "_id": "b",
        "_source": {
          "Id": "b",
          "Parent": "a"
        }
      },
      {
        "_index": "test",
        "_type": "page",
        "_id": "c",
        "_source": {
          "Id": "c",
          "Parent": "a"
        }
      }
    ]
  }
}

现在,在客户端,我可以构建根元素及其直接子元素的树视图。

但是,我也想知道(刚刚列出的)孩子的直接子女数量。

我想要一个类似的答案:

{
  "hits": {
    "total": 2,
    "hits": [
      {
        "_index": "test",
        "_type": "page",
        "_id": "b",
        "_source": {
          "Id": "b",
          "Parent": "a"
        },
        "_numberOfChildren": 1
      },
      {
        "_index": "test",
        "_type": "page",
        "_id": "c",
        "_source": {
          "Id": "c",
          "Parent": "a"
        },
        "_numberOfChildren": 0
      }
    ]
  }
}

我希望ES在某种“子查询”中动态计算_numberOfChildren

答案可能是聚合吗?

也许https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-children-aggregation.html

3 个答案:

答案 0 :(得分:1)

我不知道这是否是您要搜索的内容:

我试图插入相同的元素:

PUT /tmp_index/doc/1
{
    "id": "a",
    "parent": "0"
}

PUT /tmp_index/doc/2
{
    "id": "b",
    "parent": "a"
}

PUT /tmp_index/doc/3
{
    "id": "c",
    "parent": "a"
}

PUT /tmp_index/doc/4
{
    "id": "d",
    "parent": "c"
}

使用这样的嵌套聚合:

POST /tmp_index/_search?pretty
{
   "size": 0,
   "query": {
      "match_all": {}
   },
   "aggs": {
      "group_by_first": {
         "terms": {
            "field": "parent",
             "size" : 0
         },
         "aggs": {
            "group_by_second": {
               "terms": {
                  "field": "id",
                  "size" : 0
               }
            }
         }
      }
   }
}

你得到这个结果:

{
           "key": "a",
           "doc_count": 2,
           "group_by_second": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                 {
                    "key": "b",
                    "doc_count": 1
                 },
                 {
                    "key": "c",
                    "doc_count": 1
                 }
              ]
           }
        },
        {
           "key": "0",
           "doc_count": 1,
           "group_by_second": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                 {
                    "key": "a",
                    "doc_count": 1
                 }
              ]
           }
        },
        {
           "key": "c",
           "doc_count": 1,
           "group_by_second": {
              "doc_count_error_upper_bound": 0,
              "sum_other_doc_count": 0,
              "buckets": [
                 {
                    "key": "d",
                    "doc_count": 1
                 }
              ]
           }
        }

答案 1 :(得分:0)

[更新]

我不确定您是否可以在一次查询中完成所有操作。

我需要类似的内容,最后使用msearch进行后续查询。

POST /test/_msearch
{}
{"query" : {"term" : {"Parent": "c"}}}, "size" : 0}
{}
{"query" : {"term" : {"Parent": "d"}}}, "size" : 0}

答案 2 :(得分:0)

如果您没有多少项目:

您只需使用一个查询即可检索信息:

GET /test/page/_search
{
  "filter": {
    "term": {
      "Parent": "0"
    }
  },
  "aggs": {
    "numberOfChildren": {
      "terms": {
        "field": "Parent",
        "size": 0
      }
    }
  }
}

在回复中,hits.hits将包含0的孩子。

对于每个节点,您将拥有aggregations.numberOfChildren.buckets中具有此结构的子项数:

{
    "key": [page id],
    "doc_count": [number of children for this page]
}

回复示例:

{
  ...
    "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "test",
        "_type": "page",
        "_id": "a",
        "_score": 1,
        "_source": {
          "Id": "a",
          "Parent": "0"
        }
      }
    ]
  },
  "aggregations": {
    "numberOfChildren": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "0",
          "doc_count": 1
        },
        {
          "key": "a",
          "doc_count": 2
        },
        {
          "key": "c",
          "doc_count": 1
        }
      ]
    }
  }

请注意:

  • 如果页面没有任何hildren,则它不在列表中。
  • 你有所有父母的子女数量,而不仅仅是直接的 0的孩子,如果你有很多物品(太多了)就会破裂 桶)。

如果你有很多项目:

最简单的方法是使用两个查询:
GET /test/page/_search
{
  "query": {
    "filtered": {
      "filter": {
        "term": {
          "Parent": "0"
        }
      }
    }
  }
}

hits.hits你将有0个直接孩子。

第二个查询:
GET /test/page/_search
{
  "size": 0, 
  "query": {
    "filtered": {
      "filter": {
        "terms": {
          "Parent": [
            "a" // list 0's direct children ids
          ]
        }
      }
    }
  },
  "aggs": {
    "numberOfChildren": {
      "terms": {
        "field": "Parent",
        "size": 0,
        "order": {
          "_term": "asc"
        }
      }
    }
  }
}

您将在aggregations.numberOfChildrens.buckets

中拥有0个孩子的直接子女数量

您也可以使用脚本,但我不确定它们是否可以在这种情况下工作。

亲子关系对你没有帮助,因为父母和子女不能属于同一类型。