将完全匹配放在elasticsearch multimatch查询中

时间:2014-06-24 19:13:29

标签: lucene elasticsearch nest

看起来似乎没有简单的方法可以做到这一点......我怎样才能确保我的多重匹配查询中的某些字段实际上被正确提升,以便精确匹配显示在顶部?< / p> 老实说,我似乎已经尝试了很多方法,但也许有人知道答案......

在我的电影和音乐数据库中,我尝试一次搜索多个字段,但确保完全匹配使其位于顶部,并且某些字段(如标题和艺术家名称)会有更多提升。

这是我查询的主要部分......

"query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "type": "phrase_prefix",
            "query": "brave",
            "max_expansions": 10,
            "fields": [
              "title^3",
              "artists.name^2",
              "starring.name^2",
              "credits.name",
              "tracks^0.1"
            ]
          }
        }
      ],
      "minimum_number_should_match": 1
    }
}

如您所见,查询是“勇敢的”。它恰好发生了一部名为勇敢的电影。完美,我希望它在顶部 - 因为它不仅是完全匹配,而且匹配在标题中。然而,有一首名为“勇敢”的流行歌曲。来自sara bareilles,最终在顶部。为什么呢?

我已经尝试过人,自定义和其他方式已知的每个分析仪,并且我已经尝试更改“类型”&#39;每个其他排列的参数(短语,best_fields,cross_fields,most_fields),它似乎并没有表现出我有效地试图推广“标题”的事实。和&#39; artists.name&#39;和&#39; starring.name&#39;和DEMOTE&#39;追踪&#39;。

有什么方法可以确保所有完全匹配显示在顶部(特别是在标题等),然后是扩展等?

任何建议都会有所帮助。

修改

目前正在使用的分析仪似乎比其他分析仪更好用的是我称之为“定制分析仪”的定制分析仪。它由一个小写字母&#39;组成。过滤和&#39;关键字&#39;只有tokenizer。

这里是一些示例文档,按照它们出现在结果中的顺序:

fields": {
  "title": [
      "Brave"
  ],
  "credits.name": [
      "Kelly MacDonald",
      "Emma Thompson",
      "Billy Connolly",
      "Julie Walters",
      "Kevin McKidd",
      "Craig Ferguson",
      "Robbie Coltrane"
  ],
  "starring.name": [
      "Emma Thompson",
      "Julie Walters",
      "Billy Connolly",
      "Kevin Mckidd",
      "Kelly Macdonald"
  ]
,

fields": {
  "credits.name": [
      "Hilary Weeks",
      "Scott Wiley",
      "Sarah Sample",
      "Debra Fotheringham",
      "Dustin Christensen",
      "Russ Dixon"
  ],
  "title": [
      "Say Love"
  ],
  "artists.name": [
      "Hilary Weeks"
  ],
  "tracks": [
      "Say Love",
      "Another Second Chance",
      "It's A Good Day",
      "Brave",
      "I Found Me",
      "Hero",
      "Tell Me",
      "Where I Am",
      "Better Promises",
      "Even When"
  ]
,
fields": {
  "title": [
      "Brave Little Toaster"
  ],
  "credits.name": [
      "Randy Bennett",
      "Jim Jackman",
      "Randy Cook",
      "Judy Toll",
      "Jon Lovitz",
      "Tim Stack",
      "Timothy E. Day",
      "Thurl Ravenscroft",
      "Deanna Oliver",
      "Phil Hartman",
      "Jonathon Benair",
      "Joe Ranft"
  ],
  "starring.name": [
      "Jon Lovitz",
      "Thurl Ravenscroft",
      "Tim Stack",
      "Timothy E. Day",
      "Deanna Oliver"
  ]
},
"fields": {
   "title": [
      "Braveheart"
   ],
   "credits.name": [
      "Bernard Horsfall",
      "Martin Dempsey",
      "James Robinson",
      "Robert Paterson",
      "Alan Tall",
      "Rupert Vansittart",
      "Donal Gibson",
      "Malcolm Tierney",
      "Sandy Nelson",
      "Sean Lawlor"
   ],
   "starring.name": [
      "Brendan Gleeson",
      "Sophie Marceau",
      "Mel Gibson",
      "Patrick Mcgoohan",
      "Catherine Mccormack"
   ]
}

也许有人知道为什么第二个冠军......(在这种情况下,不像我之前所说过的那样,但是)希拉里周 - 谁有一个叫做勇敢的赛道&#39; ...为什么它在冠军之前&#39; braveheart&#39;和勇敢的小烤面包机&#39;?

再次编辑

为了使情况进一步复杂化,如果我有一个等级&#39;字段是我文档的一部分?我发现使用脚本评分函数将其添加到我的_score字段非常困难...

"functions": [
    {
      "script_score": {
        "script": "_score * 1/ doc['rank'].value"
      }
    }
  ]

0 个答案:

没有答案