Question

我使用空白分析器索引我的名为hash的字段，因此我的字段文本'1 2 3 4 5'将被索引为五个项[1、2、3、4、5]。

我的问题是如何与精确术语药水搭配？例如，精度大于4/5时，“ 2 1 3 4 5”将不匹配，“ 8 2 3 4 5”将匹配。该怎么做？

可以分为五个字段，但是我只想要一个字段。

Answer 1

您可以在查询时同时使用shingle token filter和minimum should match的组合：

说明：

使用带状令牌过滤器，“ 1 2 3 4 5”可以转换为以下令牌流：

{
  "tokens": [
    {
      "token": "1 2",
      "start_offset": 0,
      "end_offset": 3,
      "type": "shingle",
      "position": 0
    },
    {
      "token": "2 3",
      "start_offset": 2,
      "end_offset": 5,
      "type": "shingle",
      "position": 1
    },
    {
      "token": "3 4",
      "start_offset": 4,
      "end_offset": 7,
      "type": "shingle",
      "position": 2
    },
    {
      "token": "4 5",
      "start_offset": 6,
      "end_offset": 9,
      "type": "shingle",
      "position": 3
    }
  ]
}

这同样适用于您的查询。因此，只有数字顺序正确时，带状令牌才会匹配。 minimu_should_match的使用将控制需要在文档中匹配的查询令牌的百分比。

所以这是示例：

在映射中，我们配置了带状滤波器和一个使用它的分析器

PUT so_54684997
{
  "mappings": {
    "_doc": {
      "properties": {
        "content": {
          "type": "text",
          "analyzer": "myShingledAnalyzer"
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "filter": {
        "myShingle": {
          "type": "shingle",
          "output_unigrams": false
        }
      },
      "analyzer": {
        "myShingledAnalyzer": {
          "tokenizer": "whitespace",
          "filter": ["myShingle"]
        }
      }
    }
  }
}

我们添加文档

PUT so_54684997/_doc/1
{
  "content": "1 2 3 4 5"
}

查询1 =>不匹配（所有数字，但不按相同顺序显示4/5）

POST so_54684997/_search
{
  "query": {
    "match": {
      "content": {
        "query": "2 1 3 4 5",
        "minimum_should_match": "80%"
      }
    }
  }
}

查询2 =>匹配（5个数字中的4个，但顺序较好）

POST so_54684997/_search
{
  "query": {
    "match": {
      "content": {
        "query": "1 2 3 4",
        "minimum_should_match": "80%"
      }
    }
  }
}

查询3 =>匹配（5个数字中的4个按相同顺序排列）

POST so_54684997/_search
{
  "query": {
    "match": {
      "content": {
        "query": "8 2 3 4 5",
        "minimum_should_match": "80%"
      }
    }
  }
}

我不知道这是否可以处理您的所有案件，但我认为这是一个很好的提示！

Answer 2

使用空白分析器，将位置作为文本值的一部分，在索引之前将“ 1 2 3 4 5”更改为“ 0_1 1_2 2_3 3_4 4_5”，0_1表示位置为0，值为1。这是一个被索引的字段，但是搜索时仍然需要多项查询。

查询'8 2 3 4 5'：

should: [
    { term: { hash: '0_8' } },
    { term: { hash: '1_2' } },
    { term: { hash: '2_3' } },
    { term: { hash: '3_4' } },
    { term: { hash: '4_5' } },
],
minimum_should_match: 4

用elasticsearch搜索精确的词条位置

2 个答案: