Elasticsearch - 索引词,bigram和trigram

时间:2014-04-28 13:00:51

标签: elasticsearch

我试图为某些短语编制索引,例如:

"Elasticsearch is a great search engine"

索引为

Elasticsearch       # word
is                  # word
a                   # word
great               # word
engine              # word
Elasticsearch is    # bi-gram
is a                # bi-gram
a great             # bi-gram
great search        # bi-gram
search engine       # bi-gram
Elasticsearch is a  # tri-gram
is a great          # tri-gram
a great search      # tri-gram
great search engine # tri-gram

我知道如何索引单词(使用默认索引器)并索引bigrams和trigrams(使用n-gram索引器),但不能同时使用两者。

我该怎么做?

此致

1 个答案:

答案 0 :(得分:3)

您可以使用multi-field type。这是我创建的一个例子 -

{
  "settings": {
    "number_of_shards": 5,
    "number_of_replicas": 0,
    "analysis": {
      "filter": {
        "synonym": {
          "type": "synonym",
          "synonyms_path": "synonyms.txt"
        },
        "my_metaphone": {
          "type": "phonetic",
          "encoder": "metaphone",
          "replace": false
        }
      },
      "analyzer": {
        "synonym": {
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "synonym"
          ]
        },
        "metaphone": {
          "tokenizer": "standard",
          "filter": [
            "my_metaphone"
          ]
        },
        "porter": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "porter_stem"
          ]
        }
      }
    }
  },
  "mappings": {
    "type": {
      "_all": {
        "enabled": false
      },
      "properties": {
        "datafield": {
          "type": "multi_field",
          "store": "yes",
          "fields": {
            "datafield": {
              "type": "string",
              "analyzer": "simple"
            },
            "metaphone": {
              "type": "string",
              "analyzer": "metaphone"
            },
            "porter": {
              "type": "string",
              "analyzer": "porter"
            },
            "synonym": {
              "type": "string",
              "analyzer": "synonym"
            }
          }
        }
      }
    }
  }
}

然后,您可以指定要搜索的字段,即datafield.synonym或您的datafield.bigram。然后,您可以构建查询,增加对结果最重要的字段。