Question

说我已将此数据编入索引

song:{
  title:"laser game"
}

但是用户正在搜索

lasergame

您将如何进行映射/索引/查询？

Answer 1

这是一个棘手的问题。

1）我想最有效的方法可能是使用compound token filter，word list由您认为用户可能会连接的一些词组成。

"settings": {
    "analysis": {
      "analyzer": {
        "concatenate_split": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "myFilter"
          ]
        }
      },
      "filter": {
        "myFilter": {
          "type": "dictionary_decompounder",
          "word_list": [
            "laser",
            "game",
            "lean",
            "on",
            "die",
            "hard"
          ]
        }
      }
    }
  }

应用分析器后， lasergame 将分为激光和游戏以及 lasergame ，现在这将给出结果是任何这些词。

2）另一种方法可能是将整个标题与pattern replace char filter连接起来取代所有空格。

{
    "index" : {
        "analysis" : {
            "char_filter" : {
                "my_pattern":{
                    "type":"pattern_replace",
                    "pattern":"\\s+",
                    "replacement":""
                }
            },
            "analyzer" : {
                "custom_with_char_filter" : {
                    "tokenizer" : "standard",
                    "char_filter" : ["my_pattern"]
                }
            }
        }
    }
}

您需要使用multi fields这种方法，pattern，laser game将被编入索引为 lasergame ，您的查询将起作用。这里的问题是激光游戏将被编入索引 lasegameplay 并搜索 lasergame 不会返回任何内容，因此您可能需要考虑使用{{1或} prefix query。

3）这可能没有意义，但如果您认为用户经常连接某些单词，您也可以使用synonym filter。

希望这有帮助！

Answer 2

最简单的解决方案是使用nGrams。这将是开始合作的基础，可以进行调整以满足您的需求。但是你走了：

映射

PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "myAnalyzer": {
          "type": "custom",
          "tokenizer": "nGram",
          "filter": [
            "asciifolding",
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "sample": {
      "properties": {
        "myField": {
          "type": "string",
          "analyzer": "myAnalyzer"
        }
      }
    }
  }
}

测试文件

PUT /test/sample/1
{
  "myField": "laser game"
}

查询

GET /test/_search
{
  "query": {
    "match": {
      "myField": "lasergame"
    }
  }
}

结果

{
  "took": 47,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2161999,
    "hits": [
      {
        "_index": "test",
        "_type": "sample",
        "_id": "1",
        "_score": 0.2161999,
        "_source": {
          "myField": "laser game"
        }
      }
    ]
  }
}

此分析器会在索引中创建大量ngrams，例如la，las，激活... gam , game等。{{} 1}}和lasergame会生成许多类似的令牌，并会按照您的预期找到您的文档。

在Elasticsearch中查找连接词

2 个答案:

映射

测试文件

查询

结果