Elasticsearch数据聚合与阵列上的分面

时间:2013-10-01 10:44:56

标签: elasticsearch faceted-search

我在索引中插入了一些公司,其中countries属性是一组国家/地区代码:

curl -XPUT 'http://localhost:9200/test/company/10' -d '{"countries" : ["CH", "CN"], "name" : "company10"}'
curl -XPUT 'http://localhost:9200/test/company/11' -d '{"countries" : ["AT", "CH", "CN", "DE", "EN", "FR"], "name" : "company11"}'
curl -XPUT 'http://localhost:9200/test/company/12' -d '{"countries" : ["AT", "CN", "EN", "FR"], "name" : "company12"}'
curl -XPUT 'http://localhost:9200/test/company/13' -d '{"countries" : ["CH", "CN", "HU"], "name" : "company13"}'
curl -XPUT 'http://localhost:9200/test/company/14' -d '{"countries" : ["CH", "CN", "EN", "FR"], "name" : "company14"}'
curl -XPUT 'http://localhost:9200/test/company/15' -d '{"countries" : ["AT", "CN", "DE", "EN", "FR", "HU"], "name" : "company15"}'
curl -XPUT 'http://localhost:9200/test/company/16' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR", "HU"], "name" : "company16"}'
curl -XPUT 'http://localhost:9200/test/company/17' -d '{"countries" : ["BE", "CN", "EN"], "name" : "company17"}'
curl -XPUT 'http://localhost:9200/test/company/18' -d '{"countries" : ["AT", "CH", "CN", "DE"], "name" : "company18"}'
curl -XPUT 'http://localhost:9200/test/company/19' -d '{"countries" : ["AT", "CH", "CN", "DE", "EN", "FR", "HU"], "name" : "company19"}'
curl -XPUT 'http://localhost:9200/test/company/20' -d '{"countries" : ["EN", "FR"], "name" : "company20"}'
curl -XPUT 'http://localhost:9200/test/company/21' -d '{"countries" : ["AT", "BE", "DE", "FR", "HU"], "name" : "company21"}'
curl -XPUT 'http://localhost:9200/test/company/22' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR", "HU"], "name" : "company22"}'
curl -XPUT 'http://localhost:9200/test/company/23' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "HU"], "name" : "company23"}'
curl -XPUT 'http://localhost:9200/test/company/24' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "FR"], "name" : "company24"}'
curl -XPUT 'http://localhost:9200/test/company/25' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR"], "name" : "company25"}'
curl -XPUT 'http://localhost:9200/test/company/26' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "FR", "HU"], "name" : "company26"}'
curl -XPUT 'http://localhost:9200/test/company/27' -d '{"countries" : ["AT", "EN", "FR"], "name" : "company27"}'
curl -XPUT 'http://localhost:9200/test/company/28' -d '{"countries" : ["CN"], "name" : "company28"}'
curl -XPUT 'http://localhost:9200/test/company/29' -d '{"countries" : ["BE", "CH", "CN", "EN", "FR"], "name" : "company29"}'
curl -XPUT 'http://localhost:9200/test/company/30' -d '{"countries" : ["CN"], "name" : "company30"}'

我想按country_code(国家/地区属性)汇总公司,计算每个国家/地区的公司数量。

可悲的是,即使这样(AT代码的计数)也不起作用:

curl -XGET 'http://localhost:9200/test/company/_search?pretty=true' -d '
{"query"  : { "match_all" : {} },
 "facets" : {
    "foo" : {
      "filter" : {
        "term" : { "countries" : "AT" }
      }
    }
  }
}
'

我得到了:

...

"facets" : {
  "foo" : {
    "_type" : "filter",
    "count" : 0
  }
}

我做错了什么?

1 个答案:

答案 0 :(得分:5)

我认为这是因为没有分析过滤器。 AT是停用词,因此未编入索引。您可以使用_analyze API检查它:http://localhost:9200/test/_analyze?text=AT&field=countries

您可以检查非停用词,例如CN,但这是小写的http://localhost:9200/test/_analyze?text=CN&field=countries。因此,cn(实际上存储在索引中)与您在构面过滤器中的CN不匹配。

您可以尝试将搜索修改为小写国家/地区缩写:

curl -XGET 'http://localhost:9200/test/company/_search?pretty=true' -d '
{"query"  : { "match_all" : {} },
 "facets" : {
    "foo" : {
      "filter" : {
        "term" : { "countries" : "cn" }
      }
    }
  }
}'

获取

"facets" : {
    "foo" : {
      "_type" : "filter",
      "count" : 15
    }
  }

但我认为您应该将国家/地区的映射定义为"index":"not_analyzed"以避免这种情况(包括停用词和小写)

# Delete index
#
curl -XDELETE 'http://localhost:9200/test'

# Create with mapping
#
curl -XPUT 'http://localhost:9200/test/' -d '{
  "mappings": {
    "company": {
      "properties": {
        "countries": { "type": "string", "index" : "not_analyzed"  }
      }
    }
  }
}'


# Index documents
#
curl -XPUT 'http://localhost:9200/test/company/10' -d '{"countries" : ["CH", "CN"], "name" : "company10"}'
curl -XPUT 'http://localhost:9200/test/company/11' -d '{"countries" : ["AT", "CH", "CN", "DE", "EN", "FR"], "name" : "company11"}'
curl -XPUT 'http://localhost:9200/test/company/12' -d '{"countries" : ["AT", "CN", "EN", "FR"], "name" : "company12"}'
curl -XPUT 'http://localhost:9200/test/company/13' -d '{"countries" : ["CH", "CN", "HU"], "name" : "company13"}'
curl -XPUT 'http://localhost:9200/test/company/14' -d '{"countries" : ["CH", "CN", "EN", "FR"], "name" : "company14"}'
curl -XPUT 'http://localhost:9200/test/company/15' -d '{"countries" : ["AT", "CN", "DE", "EN", "FR", "HU"], "name" : "company15"}'
curl -XPUT 'http://localhost:9200/test/company/16' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR", "HU"], "name" : "company16"}'
curl -XPUT 'http://localhost:9200/test/company/17' -d '{"countries" : ["BE", "CN", "EN"], "name" : "company17"}'
curl -XPUT 'http://localhost:9200/test/company/18' -d '{"countries" : ["AT", "CH", "CN", "DE"], "name" : "company18"}'
curl -XPUT 'http://localhost:9200/test/company/19' -d '{"countries" : ["AT", "CH", "CN", "DE", "EN", "FR", "HU"], "name" : "company19"}'
curl -XPUT 'http://localhost:9200/test/company/20' -d '{"countries" : ["EN", "FR"], "name" : "company20"}'
curl -XPUT 'http://localhost:9200/test/company/21' -d '{"countries" : ["AT", "BE", "DE", "FR", "HU"], "name" : "company21"}'
curl -XPUT 'http://localhost:9200/test/company/22' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR", "HU"], "name" : "company22"}'
curl -XPUT 'http://localhost:9200/test/company/23' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "HU"], "name" : "company23"}'
curl -XPUT 'http://localhost:9200/test/company/24' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "FR"], "name" : "company24"}'
curl -XPUT 'http://localhost:9200/test/company/25' -d '{"countries" : ["AT", "BE", "CH", "DE", "EN", "FR"], "name" : "company25"}'
curl -XPUT 'http://localhost:9200/test/company/26' -d '{"countries" : ["AT", "BE", "CH", "CN", "DE", "EN", "FR", "HU"], "name" : "company26"}'
curl -XPUT 'http://localhost:9200/test/company/27' -d '{"countries" : ["AT", "EN", "FR"], "name" : "company27"}'
curl -XPUT 'http://localhost:9200/test/company/28' -d '{"countries" : ["CN"], "name" : "company28"}'
curl -XPUT 'http://localhost:9200/test/company/29' -d '{"countries" : ["BE", "CH", "CN", "EN", "FR"], "name" : "company29"}'
curl -XPUT 'http://localhost:9200/test/company/30' -d '{"countries" : ["CN"], "name" : "company30"}'

# Refresh index
#
curl -XPOST 'http://localhost:9200/test/_refresh'

# Search
#
curl -XGET 'http://localhost:9200/test/company/_search?pretty=true' -d '
{"query"  : { "match_all" : {} },
 "facets" : {
    "foo" : {
      "filter" : {
        "term" : { "countries" : "AT" }
      }
    }
  }
}
'