ElasticSearch - 列表长度的统计方面

时间:2013-01-30 18:09:56

标签: elasticsearch

我有以下示例mappipng:

{
    "book" : {
        "properties" : {
                        "author" : { "type" : "string" },
                        "title" : { "type" : "string" },
                        "reviews" : {
                                "properties" : {
                                        "url" : { "type" : "string" },
                                        "score" : { "type" : "integer" }
                                }
                        },
                        "chapters" : {
                                "include_in_root" : 1,
                                "type" : "nested",
                                "properties" : {
                                        "name" : { "type" : "string" }
                                }
                        }
                }
        }
}

我想了解一些评论 - 即“评论”数组的长度。 例如,我需要的口头语言结果是:“100篇文章有10篇评论,20篇文档有5篇评论,......”

我正在尝试以下统计方面:

{
    "query" : {
        "match_all" : {}
    },
    "facets" : {
        "stat1" : {
            "statistical" : {"script" : "doc['reviews.score'].values.size()"}
        }
    }
}

但它仍然失败:

{
  "error" : "SearchPhaseExecutionException[Failed to execute phase [query_fetch], total failure; shardFailures {[mDsNfjLhRIyPObaOcxQo2w][facettest][0]: QueryPhaseExecutionException[[facettest][0]: query[ConstantScore(NotDeleted(cache(org.elasticsearch.index.search.nested.NonNestedDocsFilter@a2a5984b)))],from[0],size[10]: Query Failed [Failed to execute main query]]; nested: PropertyAccessException[[Error: could not access: reviews; in class: org.elasticsearch.search.lookup.DocLookup]
[Near : {... doc[reviews.score].values.size() ....}]
                 ^
[Line: 1, Column: 5]]; }]",
  "status" : 500
}

如何实现目标?

ElasticSearch版本为0.19.9。

以下是我的示例数据:

{
        "author" : "Mark Twain",
        "title" : "The Adventures of Tom Sawyer",
        "reviews" : [
                {
                        "url" : "amazon.com",
                        "score" : 10
                },
                {
                        "url" : "www.barnesandnoble.com",
                        "score" : 9
                }
        ],
        "chapters" : [
                { "name" : "Chapter 1" }, { "name" : "Chapter 2" }
        ]
}

{
        "author" : "Jack London",
        "title" : "The Call of the Wild",
        "reviews" : [
                {
                        "url" : "amazon.com",
                        "score" : 8
                },
                {
                        "url" : "www.barnesandnoble.com",
                        "score" : 9
                },
                {
                        "url" : "www.books.com",
                        "score" : 5
                }
        ],
        "chapters" : [
                { "name" : "Chapter 1" }, { "name" : "Chapter 2" }
        ]
}

1 个答案:

答案 0 :(得分:6)

看起来你正在使用curl来执行你的查询,这个curl语句如下所示:     curl localhost:9200 / my-index / book -d'{....}'

这里的问题是因为你使用撇号来包装请求的主体,你需要转义它包含的所有撇号。所以,你的脚本应该成为:

{"script" : "doc['\''reviews.score'\''].values.size()"}

{"script" : "doc[\"reviews.score"].values.size()"}

第二个问题是,根据您的描述,您似乎正在寻找histogram facetrange facet,但不是统计方面。所以,我建议尝试这样的事情:

curl "localhost:9200/test-idx/book/_search?search_type=count&pretty" -d '{
    "query" : {
        "match_all" : {}
    },
    "facets" : {
        "histo1" : {
            "histogram" : {
                "key_script" : "doc[\"reviews.score\"].values.size()",
                "value_script" : "doc[\"reviews.score\"].values.size()",
                "interval" : 1
            }
        }        
    }
}'

第三个问题是,将为结果列表中的每个记录调用facet中的脚本,如果你有很多结果,可能需要很长时间。所以,我建议索引一个名为number_of_reviews的附加字段,该字段应该填充客户端的评论数量。然后您的查询将变为:

curl "localhost:9200/test-idx/book/_search?search_type=count&pretty" -d '{
    "query" : {
        "match_all" : {}
    },
    "facets" : {
        "histo1" : {
            "histogram" : {
                "field" : "number_of_reviews"
                "interval" : 1
            }
        }        
    }
}'