如果适用,按“年龄”提高Elasticsearch结果

时间:2019-06-26 13:29:32

标签: elasticsearch

我想在Elasticsearch中搜索多个索引(search_news中的新闻项和search_documents中的文档),并且只要索引具有PublicationDate字段(仅新闻项),我都希望对其进行“排序”,以便提高新的新闻项。我正在使用Elasticsearch 6.8。

我在https://dzone.com/articles/23-useful-elasticsearch-example-queries(最后一个)中找到了script_scoring示例。但这会引发错误,并且基于我提出的文档

GET /search_*/_search
{
    "query": {
      "function_score": {
        "query": {
          "bool": {
            "must": {
              "query_string": {
                "query": "Lorem Ipsum"
              }
            },
            "must_not": {
                  "exists": {
                      "field": "some_exlusion_field"
                  }
              }
          }
        },
        "script_score": {
          "script": {
              "params" : {
                  "threshold": "2019-04-04"
              },
              "source": "publishDate = doc['publishDate'].value;  if (publishDate > Date.parse('yyyy-MM-dd', threshold).getTime()) { return log(2.5) } return log(1);"
          }
        }
      }
    }
}

这会导致错误:

{
  "error": {
    "root_cause": [
      {
        "type": "script_exception",
        "reason": "compile error",
        "script_stack": [
          "publishDate = doc['publis ...",
          "^---- HERE"
        ],
        "script": "publishDate = doc['publishDate'].value;  if (publishDate > Date.parse('yyyy-MM-dd', threshold).getTime()) { return log(2.5) } return log(1);",
        "lang": "painless"
      }
}

我设法缩小了来源:
"source": "if (doc['publishDate'] > '2019-04-04') { return 5 } return 1;"
但没有成功:

"failures" : [
      {
        "shard" : 0,
        "index" : "search_document_page",
        "node" : "c0iLpxiJRqmgwS0KY8OybA",
        "reason" : {
          "type" : "script_exception",
          "reason" : "runtime error",
          "script_stack" : [
            "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:81)",
            "org.elasticsearch.search.lookup.LeafDocLookup.get(LeafDocLookup.java:39)",
            "if (doc['publishDate'] > '2019-04-04') { ",
            "        ^---- HERE"
          ],
          "script" : "if (doc['publishDate'] > '2019-04-04') { return 5 } return 1;",
          "lang" : "painless",
          "caused_by" : {
            "type" : "illegal_argument_exception",
            "reason" : "No field found for [publishDate] in mapping with types []"
          }
        }
      },
     {
        "shard" : 0,
        "index" : "search_news",
        "node" : "c0iLpxiJRqmgwS0KY8OybA",
        "reason" : {
          "type" : "script_exception",
          "reason" : "runtime error",
          "script_stack" : [
            "if (doc['publishDate'] > '2019-04-04') { ",
            "       ^---- HERE"
          ],
          "script" : "if (doc['publishDate'] > '2019-04-04') { return 5 } return 1;",
          "lang" : "painless",
          "caused_by" : {
            "type" : "class_cast_exception",
            "reason" : "Cannot apply [>] operation to types [org.elasticsearch.index.fielddata.ScriptDocValues.Dates] and [java.lang.String]."
          }
        }
      }
    ] 
  }
}

是否有任何建议检查doc中字段的存在以及如何正确检查日期?

1 个答案:

答案 0 :(得分:0)

对于存在性检查(doc here):

if (Date.parse('yyyy-MM-dd', params.threshold).getMillis() > doc['publishDate'].getMillis()) {
    return 5;
} else {
    return 1;
}

对于日期比较,您可以尝试this way

friends = []
for friend in tweepy.Cursor(api.friends, count=200).items():
    friends.append(friend)