在Elastic Search中忽略TF-IDF

时间:2018-10-12 16:14:08

标签: elasticsearch tf-idf

我有一个基于职位描述关键字的简历筛选候选人的用例。由于每次将新的候选人资料添加到内容列表时我都无法负担分数的变化(我认为IDF会发生变化),因此我想省略TF_IDF。

索引文档为

make

根据此处的documentation,我创建了以下查询

{
                "_index": "crawler_profiles",
                "_type": "_doc",
                "_id": "81ebeb3ff52d90a488b7bce752a4a0cf",
                "_score": 1,
                "_source": {
                    "content": "Peachtree MBA"
                    }
}

我遇到以下错误

 {
  "query": {
    "bool": {
      "should": [
        { "constant_score": {
          "query": { "match": { "content": "corporate strategy" }}
        }},
        { "constant_score": {
          "query": { "match": { "content": "strategy consulting" }}
        }},
        { "constant_score": {
          "query": { "match": { "content": "international strategy" }}
        }},
        { "constant_score": {
          "query": { "match": { "content": "MBA" }}
        }}
      ]
    }
  }
}

我想要的是对一个词存在1-或-n得分为1,如果不存在则得分为0(最终跳过tf-idf)。任何帮助表示赞赏。

ES版本:6.4.2

1 个答案:

答案 0 :(得分:0)

您链接的文档适用于ES 2.x版。在6.4.x中,有一些更改,如下所示:https://www.elastic.co/guide/en/elasticsearch/reference/6.4/query-dsl-constant-score-query.html

您应该能够将查询更新为以下内容:

编辑: "term"过滤器更新为使用"match"

{
  "query": {
    "bool": {
      "should": [
        { "constant_score": {
          "filter": { "match": { "description": "corporate strategy" }}
        }},
        { "constant_score": {
          "filter": { "match": { "description": "strategy consulting" }}
        }},
        { "constant_score": {
          "filter": { "match": { "description": "international strategy" }}
        }},
        { "constant_score": {
          "filter": { "match": { "description": "MBA" }}
        }}
      ]
    }
  }
}