Elasticsearch ownerive_english词干查询未返回任何匹配

时间:2019-10-06 01:54:41

标签: ruby-on-rails elasticsearch

我能够找到另一个问题:Using of possessive_english stemmer in Elasticsearch 但是已经有3年了

我正在尝试让Elasticsearch在索引和搜索时忽略'。例如:

POST my_index/_doc/
{
  "message" : "Mike's bike"
}

我希望能够使用“ mikes”,“ mike's”,“ mike”来搜索此文档。我看上去并认为possessive_english应该可以完成此任务,但是我一直无法获得预期的结果。

我创建的索引是

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "rebuilt_standard": {
          "tokenizer": "standard",
          "filter": [
            "lowercase", "my_stemmer"      
          ]
        }
      },
      "filter": {
        "my_stemmer":{
          "type": "stemmer",
          "language": "possessive_english"
        }
      }
    }
  }
}

我用...测试了分析仪

POST /my_index/_analyze
{
  "analyzer": "rebuilt_standard",
  "text": "Mike's bike"
}

这就是结果

{
  "tokens" : [
    {
      "token" : "mike",
      "start_offset" : 0,
      "end_offset" : 6,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "bike",
      "start_offset" : 7,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

看起来分析仪正在工作。然后我将文档插入:

POST my_index/_doc/
{
  "message" : "Mike's bike"
}

搜索时,它返回了0条结果

GET /my_index/_search
{
    "query": {
        "match": {"message": "mike"}
    }
}
GET /my_index/_search
{
    "query": {
        "match": {"message": "mikes"}
    }
}

但是

GET /my_index/_search
{
    "query": {
        "match": {"message": "mike's"}
    }
}

返回结果

似乎我从链接的问题中丢失了东西在映射方面的配置,但是我不确定如何设置它。

我使用kibana测试了上述内容,但实际上我使用了带有存储库模式的rails和gems“ elasticsearch-model”,“ elasticsearch-rails”,“ elasticsearch-persistence”。我也是Rails的新手,所以我不知道它的配置是否与rails,elasticsearch或两者都需要工作。

为了防万一,我会发布它们

  include Elasticsearch::Persistence::Repository
  include Elasticsearch::Persistence::Repository::DSL

  client = Elasticsearch::Client.new(url: 'http://localhost:9200', log: true)

  settings index: {
      number_of_shards: 1,
      analysis: {
          analyzer: {
              custom: {
                  type: "custom",
                  tokenizer: "standard",
                  filter: [
                      "lowercase",
                      "english_possessive_stemmer",
                  ]
              }
          },
          filter: {
              english_possessive_stemmer: {
                  type: "stemmer",
                  language: "possessive_english",
              }
          }
      }
  }
  mappings {
    indexes :icon, index: false
    indexes :properties, type: 'nested' do
      indexes :values
    end
    indexes :name
  }

在控制器中

repository = Repository.new
repository.create_index!(force: true)
repository.save(json)
results = repository.search(query: { match: { name: 'Mikes' } })

1 个答案:

答案 0 :(得分:0)

您的分析仪工作正常。我认为您尚未将其应用于映射

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "rebuilt_standard": {
          "tokenizer": "standard",
          "filter": [
            "lowercase", "my_stemmer","english_stemmer"      
          ]
        }
      },
      "filter": {
        "my_stemmer":{
          "type": "stemmer",
          "language": "possessive_english"
        },
         "english_stemmer": {
          "type": "stemmer",
          "language": "english" 
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "message":{
        "type": "text",
        "analyzer": "rebuilt_standard" ---> pass the analyzer
      }
    }
  }
}

possessive_english过滤器仅删除“'”,您不能使用它来搜索mikes(尽管它适用于mike)。您将需要使用词干分析器,它将词减少为基本形式。

我有一篇很棒的文章here供进一步参考。