Elasticsearch搜索数组和相同字段的Boost得分

时间:2018-06-15 08:02:17

标签: elasticsearch

首先,我没有经常使用Elasticsearch,所以,我提前为愚蠢的查询道歉; - )。

我目前正在为编辑工作。

我们的ES信息如下:

{
  "name" : "Lockjaw",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "UUID",
  "version" : {
    "number" : "2.4.6",
    "build_hash" : "5376dca9f70f3abef96a77f4bb22720ace8240fd",
    "build_timestamp" : "2017-07-18T12:17:44Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.4"
  },
  "tagline" : "You Know, for Search"
}

一些背景信息。我们的后端运行在WordPress上,我们正在使用标签。好的,这很标准,但是,有些标签标有" is_topic"。所以,我试图实现的目标如下。

问题1。 当用户保存帖子时,系统应根据标签和主题查找相关帖子。标签比主题更重要。所以我尝试了以下查询:

"query":{  
     "filtered":{  
        "filter":{  
           "bool":{  
              "must":[  
                 {  
                    "term":{  
                       "post_type":"post"
                    }
                 },
                 {  
                    "range":{  
                       "post_date":{  
                          "gte":"2018-06-15 09:00:00"
                       }
                    }
                 }
              ],
              "should":[  
                 {  
                    "match":{  
                       "terms.post_tag.term_id":{  
                          "query":[  
                             38,
                             11642
                          ],
                          "boost":1
                       }
                    }
                 },
                 {  
                    "match":{  
                       "terms.post_tag.term_id":{  
                          "query":[  
                             1133,
                             8708,
                             27774
                          ],
                          "boost":2
                       }
                    }
                 }
              ]
           }
        }
     }
  },
  "size":5

在上面的查询中,第一个"应该"是我的主题,第二个"应该"是我的标签。我在这里收到错误:

{  
   "error":{  
      "root_cause":[  
         {  
            "type":"query_parsing_exception",
            "reason":"[match] unknown token [START_ARRAY] after [query]",
            "index":"myindexname",
            "line":1,
            "col":199
         }
      ],
      "type":"search_phase_execution_exception",
      "reason":"all shards failed",
      "phase":"query",
      "grouped":true,
      "failed_shards":[  
         {  
            "shard":0,
            "index":"myindexname",
            "node":"oVegK7J9Tf6T-IXRUXGYvg",
            "reason":{  
               "type":"query_parsing_exception",
               "reason":"[match] unknown token [START_ARRAY] after [query]",
               "index":"myindexname",
               "line":1,
               "col":199
            }
         }
      ]
   },
   "status":400
} 

以下是应该找到的示例文档:

"post_id" : 477398,
"post_date" : "2018-02-28 08:00:00",
"post_date_gmt" : "2018-02-28 07:00:00",
"post_title" : "Article Title",
"post_excerpt" : "",
"post_content" : "Content Here",
"post_status" : "publish",
"post_name" : "article-title",
"post_type" : "post",
"post_mime_type" : "",
"permalink" : "https://www.example.com/archive/2018/02/28/article-title/",
"terms" : {
  "category" : [ {
    "term_id" : 1,
    "slug" : "artikelen",
    "name" : "Alle artikelen",
    "parent" : 0
  }, {
    "term_id" : 15035,
    "slug" : "commerce",
    "name" : "Commerce",
    "parent" : 0
  } ],
  "post_tag" : [ {
    "term_id" : 29297,
    "slug" : "custom-labels",
    "name" : "Custom labels",
    "parent" : 0
  }, {
    "term_id" : 38,
    "slug" : "e-commerce",
    "name" : "E-commerce",
    "parent" : 0
  }, {
    "term_id" : 2345,
    "slug" : "google-adwords",
    "name" : "Google AdWords",
    "parent" : 0
  }, {
    "term_id" : 11642,
    "slug" : "google-shopping",
    "name" : "Google Shopping",
    "parent" : 0
  }, {
    "term_id" : 1133,
    "slug" : "webshops",
    "name" : "Webshops",
    "parent" : 0
  } ],
  "post-content-type" : [ {
    "term_id" : 8708,
    "slug" : "strategie",
    "name" : "Strategie",
    "parent" : 0
  } ],
  "sector" : [ {
    "term_id" : 27774,
    "slug" : "retail-webshops",
    "name" : "Retail & Webshops",
    "parent" : 0
  } ]
}

映射如下:对于此字段:

"terms" : {
  "properties" : {
    "post_tag" : {
      "properties" : {
        "name" : {
          "type" : "string",
          "fields" : {
            "raw" : {
              "type" : "string",
              "index" : "not_analyzed"
            },
            "sortable" : {
              "type" : "string",
              "analyzer" : "ewp_lowercase"
            }
          }
        },
        "parent" : {
          "type" : "long"
        },
        "slug" : {
          "type" : "string",
          "index" : "not_analyzed"
        },
        "term_id" : {
          "type" : "long"
        }
      }
    }
  }
}

有人可以帮我格式化这个查询吗?或者这甚至不可能?

问题2: 有些文件不应该退回。我们正在为此功能设置黑名单,这是一个带有must_not查询的简单功能。但是,这是棘手的部分。正如您在我的查询中所看到的,我对范围进行了过滤,因此只有具有post_date<相对于新帖子日期返回2年。但是,我们还想要一份2年以上文档的白名单。这甚至可能吗?怎么样?

提前谢谢!

  • 丹尼

0 个答案:

没有答案