使用Elasticsearch进行分层分面

时间:2013-12-16 18:31:45

标签: lucene full-text-search elasticsearch pivot-table faceted-search

我正在使用elasticsearch并需要实现分层对象的方面搜索,如下所示:

  • 第1类(10)
    • 子类别1(4)
    • 子类别2(6)
  • 第2类(X)
    • ...

所以我需要获得两个相关对象的方面。文档说,有可能为数值获得这样的方面,但我需要它用于字符串http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-stats-facet.html

这是另一个有趣的话题,不幸的是它已经过时了:http://elasticsearch-users.115913.n3.nabble.com/Pivot-facets-td2981519.html

弹性搜索有可能吗? 如果是这样,我该怎么做?

2 个答案:

答案 0 :(得分:5)

之前的解决方案非常有效,直到您在单个文档上只有一个多级标记。在这种情况下,简单聚合不起作用,因为lucene字段的扁平结构将结果混合在内部聚合上。 请参阅以下示例:

DELETE /test_category
POST /test_category

# Insert a doc with 2 hierarchical tags 
POST /test_category/test/1 
{
  "categories": [
    {
      "cat_1": "1",
      "cat_2": "1.1"
    },
    {
      "cat_1":  "2",
      "cat_2": "2.2"
    }
  ]
}

# Simple two-levels aggregations query
GET /test_category/test/_search?search_type=count
{
  "aggs": {
    "main_category": {
      "terms": {
        "field": "categories.cat_1"
      },
      "aggs": {
        "sub_category": {
          "terms": {
            "field": "categories.cat_2"
          }
        }
      }
    }
  }
}

这是我在ES 1.4上得到的错误响应,其中内部聚合上的字段在文档级别混合:

{
   ...
   "aggregations": {
      "main_category": {
         "buckets": [
            {
               "key": "1",
               "doc_count": 1,
               "sub_category": {
                  "buckets": [
                     {
                        "key": "1.1",
                        "doc_count": 1
                     },
                     {
                        "key": "2.2",  <= WRONG
                        "doc_count": 1
                     }
                  ]
               }
            },
            {
               "key": "2",
               "doc_count": 1,
               "sub_category": {
                  "buckets": [
                     {
                        "key": "1.1", <= WRONG
                        "doc_count": 1
                     },
                     {
                        "key": "2.2",
                        "doc_count": 1
                     }
                  ]
               }
            }
         ]
      }
   }
}

解决方案可以是使用嵌套对象。这些是要做的步骤:

1)在具有嵌套对象的模式中定义新类型

POST /test_category/test2/_mapping
{
  "test2": {
    "properties": {
      "categories": {
        "type": "nested",
        "properties": {
          "cat_1": {
            "type": "string"
          },
          "cat_2": {
            "type": "string"
          }
        }
      }
    }
  }
}

# Insert a single document 
POST /test_category/test2/1 
{"categories":[{"cat_1":"1","cat_2":"1.1"},{"cat_1":"2","cat_2":"2.2"}]}

2)运行嵌套聚合查询:

GET /test_category/test2/_search?search_type=count
{
  "aggs": {
    "categories": {
      "nested": {
        "path": "categories"
      },
      "aggs": {
        "main_category": {
          "terms": {
            "field": "categories.cat_1"
          },
          "aggs": {
            "sub_category": {
              "terms": {
                "field": "categories.cat_2"
              }
            }
          }
        }
      }
    }
  }
}

这是我的反应,现在是正确的:

{
       ...
       "aggregations": {
          "categories": {
             "doc_count": 2,
             "main_category": {
                "buckets": [
                   {
                      "key": "1",
                      "doc_count": 1,
                      "sub_category": {
                         "buckets": [
                            {
                               "key": "1.1",
                               "doc_count": 1
                            }
                         ]
                      }
                   },
                   {
                      "key": "2",
                      "doc_count": 1,
                      "sub_category": {
                         "buckets": [
                            {
                               "key": "2.2",
                               "doc_count": 1
                            }
                         ]
                      }
                   }
                ]
             }
          }
       }
    }

相同的解决方案可以扩展到两个以上的层次结构方面。

答案 1 :(得分:3)

目前,elasticsearch不支持开箱即用的分层构面。但即将发布的1.0版本具有一个新的aggregations模块,可用于获取这些类型(更像是数据透视面而不是分层面)。版本1.0目前处于测试阶段,您可以download the second beta自行测试聚合素。您的示例可能看起来像

curl -XPOST 'localhost:9200/_search?pretty' -d '
{
   "aggregations": {
      "main category": {
         "terms": {
            "field": "cat_1",
            "order": {"_term": "asc"}
         },
         "aggregations": {
            "sub category": {
               "terms": {
                  "field": "cat_2",
                  "order": {"_term": "asc"}
               }
            }
         }
      }
   }
}'

我们的想法是,为每个级别的构面设置不同的字段,并根据第一级(cat_1)的条款对您的构面进行分析。根据第二级(cat_2)的条款,这些聚合将具有子桶。结果可能看起来像

{
  "aggregations" : {
    "main category" : {
      "buckets" : [ {
        "key" : "category 1",
        "doc_count" : 10,
        "sub category" : {
          "buckets" : [ {
            "key" : "subcategory 1",
            "doc_count" : 4
          }, {
            "key" : "subcategory 2",
            "doc_count" : 6
          } ]
        }
      }, {
        "key" : "category 2",
        "doc_count" : 7,
        "sub category" : {
          "buckets" : [ {
            "key" : "subcategory 1",
            "doc_count" : 3
          }, {
            "key" : "subcategory 2",
            "doc_count" : 4
          } ]
        }
      } ]
    }
  }
}