Question

我们正在对嵌套对象进行match_phrase查询，其中嵌套对象只有字符串值。

我们打算查找字符串短语出现次数。

让我们假设，

1）映射如下。

"attr": {
                "type": "nested",
                "properties": {
                    "attr": {
                        "type": "multi_field",
                        "fields": {
                            "attr": { "type": "string", "index": "analyzed", "include_in_all": true, "analyzer": "keyword" },
                            "untouched": { "type": "string", "index": "analyzed", "include_in_all": false, "analyzer": "not_analyzed" }
                        }
                    }
                }
            }

2）数据就像。

对象A：

"attr": [
    {
        "attr": "beverage"
    },
    {
        "attr": "apple wine"
    }
]

对象B：

"attr": [
    {
        "attr": "beverage"
    },
    {
        "attr": "apple"
    },
    {
        "attr": "wine"
    }
]

3）因此，在查询上

{
    "query": {
        "match": {
            "_all": {
                "query": "apple wine",
                "type": "phrase"
                }
            }
        }
    }

我们只期待对象A，但不幸的是对象B也来了。

请期待您的建议。

Answer 1

在您的情况下，单独的数组值在其偏移量中应该有较大的间隙，以避免短语匹配。同一字段的实例之间存在默认的可配置间隔，但此间隙的默认值为0.

您应该在字段映射中更改它：

"attr": { "type": "string", 
"index": "analyzed", 
"include_in_all": true, 
"analyzer": "keyword", 
"position_offset_gap": 100 
}

Answer 2

您还需要告诉查询在一个嵌套文档中搜索所有术语：

"query": {
  "nested": {
    "path": "attr",
    "query": {
      "match": {
        "attr": {
          "query": "apple wine",
          "operator": "and"
        }
      }
    }
  }
}

良好的信息来源是http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/

弹性搜索嵌套match_phrase问题

2 个答案: