对于土耳其语单词

时间:2018-03-22 15:26:19

标签: elasticsearch collation range-query

我记录了哪些土耳其语单词如“şa,za,sb,şc,sd,şe”等作为customer_address属性。

我已将文档编入索引,如下所示,因为我想根据customer_address字段订购文档。排序运作良好。 Sorting and Collations

现在我正在尝试在“customer_address”字段上应用范围查询。当我发送下面的查询时,我的结果是空的。 (预期结果:sb,sd,şa,şd)

curl -XGET http://localhost:9200/sampleindex/_search?pretty -d '{"query":{"bool":{"filter":[{"range":{"customer_address.sort":{"from":"plaj","to":"şcam","include_lower":true,"include_upper":true,"boost":1.0}}}],"disable_coord":false,"adjust_pure_negative":true,"boost":1.0}}}'

当我查询时,我看到我的字段按照文档中的规定进行了加密。

curl -XGET http://localhost:9200/sampleindex/_search?pretty -d '{"aggs":{"myaggregation":{"terms":{"field":"customer_address.sort","size":10000}}},"size":0}'

{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
    "total" : 6,
    "max_score" : 0.0,
    "hits" : [ ]
  }
"aggregations" : {
    "a" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "⚕䁁䀠怀\u0001",
          "doc_count" : 1
        },
        {
          "key" : "⚗䁁䀠怀\u0001",
          "doc_count" : 1
        },
        {
          "key" : "✁ੀ⃀ၠ\u0000\u0000",
          "doc_count" : 1
        },
        {
          "key" : "✁ୀ⃀ၠ\u0000\u0000",
          "doc_count" : 1
        },
        {
          "key" : "✁ీ⃀ၠ\u0000\u0000",
          "doc_count" : 1
        },
        {
          "key" : "ⶔ䁁䀠怀\u0001",
          "doc_count" : 1
        }
      ]
    }
  }
}

那么,我应该如何在范围查询中发送我的参数才能获得成功的结果?

提前致谢。

我的映射:

curl -XGET http://localhost:9200/sampleindex?pretty
{
  "sampleindex" : {
    "aliases" : { },
    "mappings" : {
      "invoice" : {
        "properties" : {
          "customer_address" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword"
              },
              "sort" : {
                "type" : "text",
                "analyzer" : "turkish",
                "fielddata" : true
              }
            }
          }
       } 
    },
    "settings" : {
      "index" : {
        "number_of_shards" : "5",
        "provided_name" : "sampleindex",
        "max_result_window" : "2147483647",
        "creation_date" : "1521732167023",
        "analysis" : {
          "filter" : {
            "turkish_phonebook" : {
              "variant" : "@collation=phonebook",
              "country" : "TR",
              "language" : "tr",
              "type" : "icu_collation"
            },
            "turkish_lowercase" : {
              "type" : "lowercase",
              "language" : "turkish"
            }
          },
          "analyzer" : {
            "turkish" : {
              "filter" : [
                "turkish_lowercase",
                "turkish_phonebook"
              ],
              "tokenizer" : "keyword"
            }
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "ChNGX459TUi8VnBLTMn-Ng",
        "version" : {
          "created" : "5020099"
        }
      }
    }
  }
}

1 个答案:

答案 0 :(得分:0)

我通过在创建索引期间使用char filter定义分析器来解决我的问题。我不知道这是否是一个好的解决方案,但我无法通过" turkish_phonebook" ICU,所以解决方案现在似乎有效。

首先,我使用" turkish_collat​​ion_analyzer"创建了一个索引。然后对于需要这个的我的属性,我创建了一个字段" property.tr"使用这个定义的分析器。最后,在范围查询期间,我按照此字段的预期转换了我的值。

"settings": {
  "index": {
    "number_of_shards": "5",
    "provided_name": "sampleindex",
    "max_result_window": "2147483647",
    "creation_date": "1522050241730",
    "analysis": {
      "analyzer": {
        "turkish_collation_analyzer": {
          "char_filter": [
            "turkish_char_filter"
          ],
          "tokenizer": "keyword"
        }
      },
      "char_filter": {
        "turkish_char_filter": {
          "type": "mapping",
          "mappings": [
            "a => x01",
            "b => x02",
            .,
            .,
            .,

          ]
        }
      }
    },
    "number_of_replicas": "1",
    "uuid": "hiEqIpjYTLePjF142B8WWQ",
    "version": {
      "created": "5020099"
    }
  }
}