如何在elasticsearch中将一个单词映射到另一个单词?

时间:2016-02-01 06:36:10

标签: elasticsearch

如何在Elasticsearch中将单词映射到另一个单词?假设我有以下数据文档

{
"carName" : "Porche"
"review": " this car is so awesome"
}

现在,当我搜索好/奇妙等时,它应该映射到“真棒”。 有什么方法可以在elasticsearch中做到这一点吗?

1 个答案:

答案 0 :(得分:1)

是的,您可以使用synonym token filter来实现此目的。

首先,您需要在索引中定义新的自定义分析器,并在映射中使用该分析器。

curl -XPUT localhost:9200/cars -d '{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "synonyms"
          ]
        }
      },
      "filter": {
        "synonyms": {
          "type": "synonym",
          "synonyms": [
            "good, awesome, fantastic"
          ]
        }
      }
    }
  },
  "mappings": {
    "car": {
      "properties": {
        "carName": {
          "type": "string"
        },
        "review": {
          "type": "string",
          "analyzer": "my_analyzer"
        }
      }
    }
  }
}'

您可以直接在设置中添加任意数量的同义词,也可以使用synonyms_path属性在设置中引用的单独文件中添加。

然后我们可以将您的示例文档编入索引:

curl -XPUT localhost:9200/cars/car/1 -d '{
  "carName": "Porche",
  "review": " this car is so awesome"
}'

synonyms令牌过滤器启动时,它会将令牌goodfantasticawesome一起编入索引,以便您可以通过这些令牌搜索并查找该文档。具体而言,分析句子this car is so awesome ...

curl -XGET 'localhost:9200/cars/_analyze?analyzer=my_analyzer&pretty' -d 'this car is so awesome'

...将产生以下令牌(参见最后三个令牌)

{
  "tokens" : [ {
    "token" : "this",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "car",
    "start_offset" : 5,
    "end_offset" : 8,
    "type" : "<ALPHANUM>",
    "position" : 2
  }, {
    "token" : "is",
    "start_offset" : 9,
    "end_offset" : 11,
    "type" : "<ALPHANUM>",
    "position" : 3
  }, {
    "token" : "so",
    "start_offset" : 12,
    "end_offset" : 14,
    "type" : "<ALPHANUM>",
    "position" : 4
  }, {
    "token" : "good",
    "start_offset" : 15,
    "end_offset" : 22,
    "type" : "SYNONYM",
    "position" : 5
  }, {
    "token" : "awesome",
    "start_offset" : 15,
    "end_offset" : 22,
    "type" : "SYNONYM",
    "position" : 5
  }, {
    "token" : "fantastic",
    "start_offset" : 15,
    "end_offset" : 22,
    "type" : "SYNONYM",
    "position" : 5
  } ]
}

最后,您可以像这样搜索并检索文档:

curl -XGET localhost:9200/cars/car/_search?q=review:good