Elasticsearch使用带有同义词的shingle过滤器

时间:2016-11-18 15:50:56

标签: elasticsearch

我有以下文件:

  • south africa
  • north africa

我想从以下地址检索我的“南非”文件:

  • s africa(a)
  • southafrica(b)
  • safrica(c)

我定义了以下过滤器和分析器:

POST test_index
{
  "settings": {
   "analysis": {
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "south,s",
            "north,n"
          ]
        },
        "shingle_filter": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 3,
            "token_separator": ""
          }
      },
      "analyzer": {
        "my_shingle": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["shingle_filter"]
        },
        "my_shingle_synonym": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["shingle_filter", "synonym_filter"]
        },
        "my_synonym_shingle": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["synonym_filter", "shingle_filter"]
        }
    }
  } 
  },
  "mappings": {}
}

1) my_shingle south africa将被编入索引为southsouthafricaafrica

2) my_shingle_synonym south africa将被编入索引为southssouthafricaafrica

3) my_synonym_shingle south africa将被编入索引为southsouthssouthsafricas,{{1 },safrica

所以用

  • (1)我会找到b

  • (2)我会找到a,b

  • (3)我会找到一个,c

我希望将africa编入索引为:south africasouthssouthafricasafrica

1 个答案:

答案 0 :(得分:1)

必须根据您的要求输出所有可能的令牌。您可以通过在multi fields上使用不同的分析器来解决您的问题。

您可以像这样定义所需字段的mapping

"mappings": {
    "your_mapping": {
      "properties": {
        "name": {
          "type": "string",
          "analyzer": "my_shingle",
          "fields": {
            "synonym": {
              "type": "string",
              "analyzer": "my_synonym_shingle"
            }
          }
        }
      }
    }
  }

索引的样本文件

PUT test_index/your_mapping/1
{
  "name" : "south africa"
}

然后您将使用wildcard expression查询名称字段的所有变体。

GET test_index/your_mapping/_search
{
  "query": {
    "query_string": {
      "fields": [
        "name*"
      ],
      "query": "safrica"
    }
  }
}