Question

我刚刚发现了"more_like_this" query type并试图将它与我的嵌套对象一起使用。不幸的是，这个查询似乎无法在嵌套对象中搜索。这是我的映射：

"Presentation": {
    "properties": {
      "id": {
        "include_in_all": false,
        "type": "string"
      },
      "title": {
        "include_in_all": true,
        "type": "string"
      },
      "description": {
        "include_in_all": true,
        "type": "string"
      },
      "categories": {
        "properties": {
          "id": {
            "include_in_all": false,
            "type": "string"
          },
          "category": {
            "include_in_all": true,
            "type": "string"
          },
          "category_suggest": {
            "properties": {
              "input": {
                "type": "string"
              },
              "payload": {
                "properties": {
                  "id": {
                    "type": "long"
                  }
                }
              }
            }
          }
        },
        "type": "nested"
      }
    }
  }

我的目标是找到ID为“96”的所有相关演示文稿，并提升与“96”相同类别的演示文稿。但是，在执行下面的查询时，Elasticsearch只计算“标题”和“描述”字段的分数（而不是“类别”）。

{
  "size": 4,
  "query": {
    "more_like_this": {
      "like": [
        {
          "_index": "client",
          "_type": "Presentation",
          "_id": "96"
        }
      ],
      "min_term_freq": 1,
      "max_query_terms": 35,
      "min_word_length": 3,
      "minimum_should_match": "1%"
    }
  }
}

我也尝试在嵌套字段上强制查询，但它也不起作用：

{
  "size": 4,
  "query": {
    "bool": {
      "should": [
        {
          "more_like_this": {
            "like": [
              {
                "_index": "client",
                "_type": "Presentation",
                "_id": "96"
              }
            ],
            "min_term_freq": 1,
            "max_query_terms": 35,
            "min_word_length": 3,
            "minimum_should_match": "1%"                   
          }
        },
        {
            "nested" : {
                "path":"categories",
                "query" : {
                    "more_like_this": {
                        "like": [
                          {
                            "_index": "client",
                            "_type": "Presentation",
                            "_id": "96"
                          }
                        ],
                        "min_term_freq": 1,
                        "max_query_terms": 35,
                        "min_word_length": 3,
                        "minimum_should_match": "1%"
                    }
                }
            }
        }
      ]
    }
  }
}

我发现这个人有同样的问题，但使用旧版本的elasticsearch：ElasticSearch More_Like_This API and Nested Object Properties 并且，遗憾的是，没有给出可以与ES 2.x一起使用的答案（除了压平整个索引，我无法做到）。

你们中有谁对这个（奇怪的）问题有任何想法吗？谢谢:)）

Answer 1

我相信您可以指定要搜索的字段。您可以尝试直接指向嵌套变量。像这样的东西

{
  "size": 4,
  "query": {
    "more_like_this": {
      "fields": ["id", "title", "description", "categories.id","categories.description", etc...]
      "like": [
        {
          "_index": "client",
          "_type": "Presentation",
          "_id": "96"
        }
      ],
      "min_term_freq": 1,
      "max_query_terms": 35,
      "min_word_length": 3,
      "minimum_should_match": "1%"
    }
  }
}

Answer 2

I'm on ES 5.3 with the same issue (I want MLT to be calculated from the document as well as nested documents).

Your bool should solution was very helpful—I was trying to do the joining inside one MLT query and couldn't figure out how to do so.

I was able to get this to work (or at least it seems to be working fine), by specifying fields within the nested MLT query. So for your case you would add:

"fields": ["categories.*"]

to the nested MLT query. Not sure if this will work with 2.x, but thought it would be mentioning.

Answer 3

尝试在映射中添加"term_vector": "yes"属性。

根据documentation，

执行MLT的字段必须编入索引并且字符串类型。此外，当与文档一样使用时，_source必须是已启用或必须存储字段或存储term_vector。为了加速分析，它可以帮助在索引时存储术语向量。

ElasticSearch 2.x：more_like_this查询和嵌套对象

3 个答案: