使用elasticsearch对完整字符串进行排序

时间:2018-10-16 04:13:20

标签: elasticsearch

我正在尝试使用Elasticsearch对以下项目进行排序

[
    {name: 'Company 1'},
    {name: 'Company 2'},
    {name: 'aa 01'},
    {name: 'aabb'}
]

如果我按名称进行排序(->是ES的排序部分)

aa 01 -> 01
Company 1 -> 1
Company 2 -> 2
aabb -> aabb

我想要

aa 01
aabb
Company 1
Company 2

我尝试使用type: 'keyword'更改映射(->是ES的排序部分)

Company 1 -> Company 1
Company 2 -> Company 2
aa 01 -> aa 01
aabb -> aabb

我试图找到其他警示语,但它似乎是旧的ES版本,例如Elastic search alphabetical sorting based on first characterindex_analyzerindex这样的

1 个答案:

答案 0 :(得分:1)

您将按字典顺序获得结果,这对于计算机来说是完全可以的,但对人类来说却没有太大意义(期望结果按字母顺序排序)。

用于表示大写字母的字节的值比用于表示小写字母的字节的值低,因此名称以最低的字节优先排序。 ASCII Table

要实现此目的,您需要以字节顺序对应于所需排序顺序的方式为每个名称建立索引。换句话说,您需要一个可以发出单个小写令牌的分析器。

为要排序的字段创建自定义关键字分析器:

PUT /my_index
{
  "settings" : {
    "analysis" : {
      "analyzer" : {
        "custom_keyword_analyzer" : {
          "tokenizer" : "keyword",
          "filter" : ["lowercase"]
        }
      }
    }
  },
  "mappings" : {
    "_doc" : {
      "properties" : {
        "name" : {
          "type" : "text",
          "fields" : {
            "raw" : {
              "type" : "text",
              "analyzer" : "custom_keyword_analyzer",
              "fielddata": true
            }
          }
        }
      }
    }
  }
}

为您的数据编制索引:

POST my_index/_doc/1
{
  "name" : "Company 01"
}

POST my_index/_doc/2
{
  "name" : "Company 02"
}

POST my_index/_doc/3
{
  "name" : "aa 01"
}

POST my_index/_doc/4
{
  "name" : "aabb"
}

执行排序:

POST /my_index/_doc/_search
{
  "sort": "name.raw"
}

响应:

[
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "3",
        "_score": null,
        "_source": {
            "name": "aa 01"
        },
        "sort": [
            "aa 01"
        ]
    },
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "4",
        "_score": null,
        "_source": {
            "name": "aabb"
        },
        "sort": [
            "aabb"
        ]
    },
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": null,
        "_source": {
            "name": "Company 01"
        },
        "sort": [
            "company 01"
        ]
    },
    {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "2",
        "_score": null,
        "_source": {
            "name": "Company 02"
        },
        "sort": [
            "company 02"
        ]
    }
] 

参考:Sorting and Collations