用elasticsearch中的单词替换特定字符

时间:2016-02-01 07:15:58

标签: elasticsearch

我的文档字段中包含大量纯文本,其中有一些符号用于货币。如何将这些更改为相应的名称,如$ to dollar等?

1 个答案:

答案 0 :(得分:1)

您可以通过创建一个mapping char filter的自定义分析器来实现此目的,您可以在其中指定要替换哪个字符的其他字符:

curl -XPUT localhost:9200/my_index -d '{
  "settings": {
    "analysis": {
      "char_filter": {
        "currencies": {
          "type": "mapping",
          "mappings": [
            "$=>USD"               <--- define your currency mappings here
          ]
        }
      },
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "currencies"
          ]
        }
      }
    }
  },
  "mappings": {
    "my_type": {
      "properties": {
        "text": {
          "type": "string",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}'

然后,如果您将You owe me $ 100之类的句子编入索引,那么将生成的标记如下:

curl -XGET 'localhost:9200/my_index/_analyze?analyzer=my_analyzer&pretty' -d 'You owe me $ 100'

{
  "tokens" : [ {
    "token" : "You",
    "start_offset" : 0,
    "end_offset" : 3,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "owe",
    "start_offset" : 4,
    "end_offset" : 7,
    "type" : "<ALPHANUM>",
    "position" : 2
  }, {
    "token" : "me",
    "start_offset" : 8,
    "end_offset" : 10,
    "type" : "<ALPHANUM>",
    "position" : 3
  }, {
    "token" : "USD",
    "start_offset" : 11,
    "end_offset" : 12,
    "type" : "<ALPHANUM>",
    "position" : 4
  }, {
    "token" : "100",
    "start_offset" : 13,
    "end_offset" : 16,
    "type" : "<NUM>",
    "position" : 5
  } ]
}

如您所见,$符号已被字符串USD取代。