对深层嵌套属性进行排序时,Elasticsearch会出现意外结果

时间:2016-11-11 14:18:31

标签: sorting elasticsearch

我正在尝试根据文档深度嵌套的子项的属性执行一些排序。

假设我们的索引中包含发布商文档。 publisher包含图书的集合,以及 每个book都有一个标题,一个已发布的标记和一个流派分数的集合。 genre_score表示效果如何 特定书籍与特定类型匹配,或者在此情况下为genre_id

首先,让我们定义一些映射(为简单起见,我们只讨论嵌套类型):

curl -XPUT 'localhost:9200/book_index' -d '
  {
    "mappings": {
      "publisher": {
        "properties": {
          "books": {
            "type": "nested",
            "properties": {
              "genre_scores": {
                "type": "nested"
              }
            }
          }
        }
      }
    }
  }'

以下是我们的两位发布商:

curl -XPUT 'localhost:9200/book_index/publisher/1' -d '
  {
    "name": "Best Books Publishing",
    "books": [
      {
        "name": "Published with medium genre_id of 1",
        "published": true,
        "genre_scores": [
          { "genre_id": 1, "score": 50 },
          { "genre_id": 2, "score": 15 }
        ]
      }
    ]
  }'

curl -XPUT 'localhost:9200/book_index/publisher/2' -d '
  {
    "name": "Puffin Publishers",
    "books": [
      {
        "name": "Published book with low genre_id of 1",
        "published": true,
        "genre_scores": [
          { "genre_id": 1, "score": 10 },
          { "genre_id": 4, "score": 10 }
        ]
      },
      {
        "name": "Unpublished book with high genre_id of 1",
        "published": false,
        "genre_scores": [
          { "genre_id": 1, "score": 100 },
          { "genre_id": 2, "score": 35 }
        ]
      }
    ]
  }'

这是我们索引的最终定义&映射...

curl -XGET 'localhost:9200/book_index/_mappings?pretty=true'
...
{
  "book_index": {
    "mappings": {
      "publisher": {
        "properties": {
          "books": {
            "type": "nested",
            "properties": {
              "genre_scores": {
                "type": "nested",
                "properties": {
                  "genre_id": {
                    "type": "long"
                  },
                  "score": {
                    "type": "long"
                  }
                }
              },
              "name": {
                "type": "text",
                "fields": {
                  "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                  }
                }
              },
              "published": {
                "type": "boolean"
              }
            }
          },
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    }
  }
}

现在假设我们要查询发布商列表,并按照预订的人排序 在一个特定的类型。换句话说,按照其中一本书的genre_score.score对发布商进行排序 目标genre_id

我们可能会写这样的搜索查询...

curl -XGET 'localhost:9200/book_index/_search?pretty=true' -d '
  {
    "size": 5,
    "from": 0,
    "sort": [
      {
        "books.genre_scores.score": {
          "order": "desc",
          "nested_path": "books.genre_scores",
          "nested_filter": {
            "term": {
              "books.genre_scores.genre_id": 1
            }
          }
        }
      }
    ],
    "_source":false,
    "query": {
      "nested": {
        "path": "books",
        "query": {
          "bool": {
            "must": []
          }
        },
        "inner_hits": {
          "size": 5,
          "sort": []
        }
      }
    }
  }'

首先正确返回Puffin(排序值为[100]),然后返回Best Books第二(排序值为[50])。

但是假设我们只想考虑published为真的书籍。这将改变我们的期望,即首先获得最佳书籍(有一种[50])和Puffin第二(有一种[10])。

让我们更新我们的nested_filter并查询以下内容......

curl -XGET 'localhost:9200/book_index/_search?pretty=true' -d '
{
  "size": 5,
    "from": 0,
    "sort": [
      {
        "books.genre_scores.score": {
          "order": "desc",
          "nested_path": "books.genre_scores",
          "nested_filter": {
            "bool": {
              "must": [
                {
                  "term": {
                    "books.genre_scores.genre_id": 1
                  }
                }, {
                  "term": {
                    "books.published": true
                  }
                }
              ]
            }
          }
        }
      }
    ],
    "_source": false,
    "query": {
      "nested": {
        "path": "books",
        "query": {
          "term": {
            "books.published": true
          }
        },
        "inner_hits": {
        "size": 5,
        "sort": []
      }
    }
  }
}'

突然间,我们两个发布商的排序值都变为[-9223372036854775808]

为什么在顶级nested_filter的{​​{1}}中添加额外的字词会产生这种影响?

任何人都可以提供一些有关此行为发生原因的见解吗?另外,如果对提议的查询/排序有任何可行的解决方案吗?

ES1.x ES5

中都会出现这种情况

谢谢!

0 个答案:

没有答案