Question

鉴于一个包含人员列表，他们居住地点以及他们的财富/收入/税收水平的数据库，我已经给出了我的Elasticsearch 5.6.2这个映射：

mappings => {
    person => {
        properties => {
            name => {
                type   => 'text',
                fields => {
                    raw => {
                        type => 'keyword',
                    },
                },
            },

            county => {
                type   => 'text',
                fields => {
                    raw => {
                        type => 'keyword',
                    },
                },
            },

            community_name => {
                type   => 'text',
                fields => {
                    raw => {
                        type => 'keyword',
                    },
                },
            },

            wealth => {
                type => 'long',
            },

            income => {
                type => 'long',
            },

            tax => {
                type => 'long',
            },
        },
    },
},

一个县可以有几个社区，我希望进行汇总，以便为每个县和每个县的社区创建平均财富/收入/税收概览。

这似乎有效：

aggs => {
    counties => {
        terms => {
            field => 'county.raw',
            size  => 100,
            order => { _term => 'asc' },
        },

        aggs => {
            communities => {
                terms => {
                    field => 'community_name.raw',
                    size  => 1_000,
                    order => { _term => 'asc' },
                },

                aggs => {
                    avg_wealth => {
                        avg => {
                            field => 'wealth',
                        },
                    },

                    avg_income => {
                        avg => {
                            field => 'income',
                        },
                    },

                    avg_tax => {
                        avg => {
                            field => 'tax',
                        },
                    },
                },

            },

            avg_wealth => {
                avg => {
                    field => 'wealth',
                },
            },

            avg_income => {
                avg => {
                    field => 'income',
                },
            },

            avg_tax => {
                avg => {
                    field => 'tax',
                },
            },

        },

    },
},

但是，“county”和“community_name”没有正确排序，因为其中一些中有挪威字符，这意味着ES在“ØvreEiker”之前排序“Ål”，这是错误的。

如何实现正确的挪威排序？

编辑：我尝试将“community_name”字段更改为使用“icu_collation_keyword”而不是“keyword”：

community_name => {
    type   => 'text',
    fields => {
        raw => {
            type     => 'icu_collation_keyword',
            index    => 'false',
            language => 'nb',
        },
    },
},

但这会导致输出乱码：

Akershus - 276855 - 229202 - 80131
    ᦥ免⡠႐໠  - 314430 - 243684 - 87105
    ↘卑◥猔᠈〇㠖 - 202339 - 225665 - 78186
    ⚞乀⃠᷀　 - 306985 - 237405 - 83186
    ⦘卓敫တ倎瀤 - 218060 - 218407 - 75602
    ⸳䄓†怜〨 - 271174 - 216843 - 75257

Answer 1

如果要进行聚合的字段（在您的示例中为community_name）始终只有一个值，那么我认为您可以尝试以下方法，这是您到目前为止的扩展。

基本上，您可以在原始的非乱码值上添加另一个子聚合，然后在客户端获取它以进行显示。

我将在简化的映射中显示它：

PUT /icu_index { "mappings": { "my_type": { "properties": { "community": { "type": "text", "fields": { "raw": { "type": "keyword" }, "norwegian": { "type": "icu_collation_keyword", "index": false, "language": "nb" } } }, "wealth": { "type": "long" } } } } }

我们将社区名称存储为：

不变为community;

作为keyword中的community.raw;

作为icu_collation_keyword中的community.norwegian。

然后我们放了几个文件（注意：community_name有一个字符串参数，而不是字符串列表）：

PUT /icu_index/my_type/2 { "community": "Ål", "wealth": 10000 } PUT /icu_index/my_type/3 { "community": "Øvre Eiker", "wealth": 5000 }

现在我们可以进行聚合：

POST /icu_index/my_type/_search { "size": 0, "aggs": { "communities": { "terms": { "field": "community.norwegian", "order": { "_term": "asc" } }, "aggs": { "avg_wealth": { "avg": { "field": "wealth" } }, "community_original": { "terms": { "field": "community.raw" } } } } } }

我们仍按community.norwegian排序，但我们也在community.raw添加子聚合。让我们看看结果：

"aggregations": { "communities": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "⸳䃔楦၃৉瓅ᘂก捡㜂\u0000\u0001", "doc_count": 1, "community_original": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Øvre Eiker", "doc_count": 1 } ] }, "avg_wealth": { "value": 5000 } }, { "key": "⸳䄏怠怜〨\u0000\u0000", "doc_count": 1, "community_original": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 0, "buckets": [ { "key": "Ål", "doc_count": 1 } ] }, "avg_wealth": { "value": 10000 } } ] } }

现在，存储桶按照社区名称的ICU整理排序。密钥为"⸳䃔楦၃৉瓅ᘂก捡㜂\u0000\u0001"的第一个存储分区的原始值为community_original.buckets[0].key，即"Øvre Eiker"。

注意：如果community_name可以是值列表，那么这种黑客当然不会起作用。

希望这个黑客有帮助！

如何对密钥具有国际字符的聚合进行排序？

1 个答案: