Question

我想使用功能得分查询和文本接近度和权重。但该查询无法正确计算＆＃34; query.function_score.functions＆＃34;

中＆＃34; match_phrase＆＃34; 的得分

例如，让我们说我创建了策展媒体，并在2017年提供了一个横幅链接＆＃34; 财务文章＆＃34;。

我想过滤并得分如下，

过滤
- 文章必须在2017年创建。
- 该类别必须是＆＃34; finance＆＃34; 。
评分
- ＆＃34;最喜欢的＆＃34; 文章越多，得分越高。
- 如果文章在过去1个月内有评论，则得分会更高。
- 如果文章有特定标签，则得分越高。
  - （标签可能超过 100 + 字）

和数据有先决条件，

前提条件
- 数据集超过200万份文件
- 文章必须有一个＆＃34;类别＆＃34;
- 文章可能有一个或多个＆＃34;标签＆＃34;
  - 标签在单篇文章
- ＆＃34; tags_text＆＃34;是字符串文本，它是按字母顺序排列并由空格连接
  - ref：[Finding most similar arrays of integers in elasticsearch
- ＆＃34;喜爱＆＃34;是人们将文章设置为＆＃34;最喜欢的＆＃34; （例如，像Facebook一样的按钮）

示例数据和查询

// create index
$ curl -XPUT 'http://localhost:9200/blog'

并发表文章，

// create articles
curl -XPUT http://localhost:9200/blog/article/1 -d '
{
  "article_id": 1,
  "title": "Fintech company list in London",
  "tags": ["fintech", "uk", "london"],
  "tags_text": "fintech london uk",
  "category": "finance",
  "created_at": "2016-12-01T00:00:00Z",
  "last_comment_at": null,
  "favorite": 100
}'

curl -XPUT http://localhost:9200/blog/article/2 -d '
{
  "article_id": 2,
  "title": "World economy",
  "tags": ["world", "economy", "regression", "war"],
  "tags_text": "economy regression war world",
  "category": "finance",
  "created_at": "2017-02-15T00:00:00Z",
  "last_comment_at": "2017-11-01T00:00:00Z",
  "favorite": 20
}'

curl -XPUT http://localhost:9200/blog/article/3 -d '
{
  "article_id": 3,
  "title": "Bitcoin bubble",
  "tags": ["bitcoin", "bubble", "btc", "mtgox", "wizsec"],
  "tags_text": "bitcoin btc bubble mtgox wizsec",
  "category": "finance",
  "created_at": "2017-08-03T00:00:00Z",
  "last_comment_at": null,
  "favorite": 50
}'

curl -XPUT http://localhost:9200/blog/article/4 -d '
{
  "article_id": 4,
  "title": "Virtual currency in China",
  "tags": ["bitcoin", "ico", "china"],
  "tags_text": "bitcoin china ico",
  "category": "finance",
  "created_at": "2017-09-03T00:00:00Z",
  "last_comment_at": null,
  "favorite": 10
}'

curl -XPUT http://localhost:9200/blog/article/5 -d '
{
  "article_id": 5,
  "title": "Average FX rate in 2017-10",
  "tags": ["fx", "currency", "doller"],
  "tags_text": "currency doller fx",
  "category": "finance",
  "created_at": "2017-11-01T00:00:00Z",
  "last_comment_at": null,
  "favorite": 10
}'

curl -XPUT http://localhost:9200/blog/article/6 -d '
{
  "article_id": 6,
  "title": "Cat and Dog",
  "tags": ["pet", "cat", "dog", "family"],
  "tags_text": "cat dog family pet",
  "category": "pet",
  "created_at": "2017-11-02T00:00:00Z",
  "last_comment_at": null,
  "favorite": 500
}'

然后执行查询，

curl -XGET 'http://localhost:9200/blog/article/_search' -d '
{
  "_source": {
    "includes": ["article_id", "title", "tags_text"]
  },
  "query": {
    "function_score": {
      "functions": [
        {
          "field_value_factor": {
            "factor": 1,
            "modifier": "log",
            "field": "favorite"
          },
          "weight": 0.3
        },
        {
          "filter": {
            "range": {
              "last_comment_at": {
                "from": "now-30d",
                "to": null,
                "include_lower": true,
                "include_upper": false
              }
            }
          },
          "weight": 0.3
        },
        {
          "filter": {
            "match_phrase": {
              "tags_text": {
                "query": "bitcoin fintech smartphone",
                "slop": 100
              }
            }
          },
          "weight": 0.4
        }
      ],
      "query": {
        "bool": {
          "filter": [
            {"term": {"category": "finance"} },
            {
              "range": {
                "created_at": {
                  "from": "2017-01-01T00:00:00",
                  "to": "2017-12-31T23:59:59",
                  "include_lower": true,
                  "include_upper": true
                }
              }
            }
          ],
          "must": {
            "match_all": {}
          }
        }
      },
      "score_mode": "sum"
    }
  }
}'

结果如下，

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0.69030905,
    "hits": [
      {
        "_index": "blog",
        "_type": "article",
        "_id": "2",
        "_score": 0.69030905,
        "_source": {
          "article_id": 2,
          "tags_text": "economy regression war world",
          "title": "World economy"
        }
      },
      {
        "_index": "blog",
        "_type": "article",
        "_id": "3",
        "_score": 0.509691,
        "_source": {
          "article_id": 3,
          "tags_text": "bitcoin btc bubble mtgox wizsec",
          "title": "Bitcoin bubble"
        }
      },
      {
        "_index": "blog",
        "_type": "article",
        "_id": "5",
        "_score": 0.3,
        "_source": {
          "article_id": 5,
          "tags_text": "currency doller fx",
          "title": "Average FX rate in 2017-10"
        }
      },
      {
        "_index": "blog",
        "_type": "article",
        "_id": "4",
        "_score": 0.3,
        "_source": {
          "article_id": 4,
          "tags_text": "bitcoin china ico",
          "title": "Virtual currency in China"
        }
      }
    ]
  }
}

我用＆＃34;解释＆＃34; 检查了结果，但似乎＆＃34; match_phrase＆＃34;查询＆＃34; tags_text＆＃34;字段对评分完全没有影响。

如何使用加权相似度评分和功能评分查询？（我通过ES v2.4.0检查）

如何结合Elasticsearch函数得分查询和文本邻近度评分

示例数据和查询

0 个答案: