Question

我正在尝试使用Elasticsearch和Tire来索引某些数据。我希望能够在部分匹配上搜索它，而不仅仅是完整的单词。在下面的示例模型上运行查询时，它只会匹配“notes”字段中完整单词匹配的单词。我无法弄清楚原因。

class Thingy
  include Tire::Model::Search
  include Tire::Model::Callbacks

  # has some attributes

  tire do
    settings analysis: {
      filter: {
        ngram_filter: {
          type: 'nGram',
          min_gram: 2,
          max_gram: 12
        }
      },
      analyzer: {
        index_ngram_analyzer: {
          type: 'custom',
          tokenizer: 'standard',
          filter: ['lowercase']
        },
        search_ngram_analyzer: {
          type: 'custom',
          tokenizer: 'standard',
          filter: ['lowercase', 'ngram_filter']
        }
      }
    } do
      mapping do
        indexes :notes, :type => "string", boost: 10, index_analyzer: "index_ngram_analyzer", search_analyzer: "search_ngram_analyzer"
      end
    end
  end

  def to_indexed_json
    {
      id:          self.id,
      account_id:  self.account_id,
      created_at:  self.created_at,
      test:        self.test,
      notes:       some_method_that_returns_string
    }.to_json
  end
end

查询如下所示：

@things = Thing.search page: params[:page], per_page: 50 do
  query {
    boolean {
      must     { string "account_id:#{account_id}" }
      must_not { string "test:true"                }
      must     { string "#{query}"                 }
    }
  }
  sort {
    by :id, 'desc'
  }
  size 50
  highlight notes: {number_of_fragments: 0}, options: {tag: '<span class="match">'}
end

我也试过了，但它永远不会返回结果（理想情况下我希望搜索适用于所有字段，而不仅仅是注释）：

must { match :notes, "#{query}" } # tried with `type: :phrase` as well

我做错了什么？

Answer 1

你几乎到了那里！ :)问题是你实际上交换了index_analyzer和search_analyzer的角色。

让我简要解释一下它是如何运作的：

您希望在索引期间将文档单词分解为这些ngram“chunks”，因此当您为Martian这样的单词编制索引时，它会被分解为：{{1 }}。您可以使用Analyze API进行尝试：['ma', 'mar', 'mart', ..., 'ar', 'art', 'arti', ...]。

当人们搜索时，他们已经在使用这些部分ngrams，可以这么说，因为他们搜索“mar”或“mart”等等所以你不要打破他们的使用ngram tokenizer进一步的短语。

这就是您（正确地）在映射中分隔http://localhost:9200/thingies/_analyze?text=Martian&analyzer=index_ngram_analyzer和index_analyzer的原因，因此Elasticsearch知道如何在索引编制过程中分析search_analyzer属性，以及如何分析任何搜索反对此属性的短语。

换句话说，这样做：

notes

完整的，有效的Ruby代码如下。此外，我高度建议您迁移到新的 elasticsearch-model Rubygem，其中包含Tire的所有重要功能并且正在积极开发。

analyzer: { index_ngram_analyzer: { type: 'custom', tokenizer: 'standard', filter: ['lowercase', 'ngram_filter'] }, search_ngram_analyzer: { type: 'custom', tokenizer: 'standard', filter: ['lowercase'] } }

Answer 2

我遇到的问题是我使用string查询而不是match查询。搜索应该是这样写的：

@things = Thing.search page: params[:page], per_page: 50 do
  query {
    match [:prop_1, prop_2, :notes], query
  }
  sort {
    by :id, 'desc'
  }
  filter :term, account_id: account_id
  filter :term, test: false
  size 50
  highlight notes: {number_of_fragments: 0}, options: {tag: '<span class="match">'}
end

为什么这个弹性搜索/轮胎代码与部分单词不匹配？

2 个答案: