我目前正在弄清楚轮胎宝石(我对elasticsearch和lucene也是新手)并尝试了一些事情。我需要做一些(可能是非平凡的)得分,所以我试着抓住它。我阅读了网上关于评分公式的所有内容,并尝试将我发现的内容与解释的查询相匹配。
如果我正确阅读了这些数字,那么标题为“foo foo foo foo"有不同的分数,这肯定不是预期的。我想我在索引期间或之后错过了一个步骤,但我无法理解。
以下是我的代码。我并没有完全按照轮胎DSL的意图,因为我想弄清楚事情 - 事情可能会在以后的某个时间看起来更加疲惫。
require 'tire'
require 'pp'
class Model
INDEX = 'myindex'
TYPE = 'company'
class << self
def delete_index
Tire.index(INDEX) { delete }
end
def create_mapping
Tire.index INDEX do
create mappings: {
TYPE => {
properties: {
title: { type: 'string' }
}
}
}
end
end
def refresh_index
Tire.index INDEX do
refresh
end
end
end
def initialize(attributes = {})
@attributes = attributes.merge(:_id => object_id) #use oid as id, just for testing
end
def _type
TYPE
end
def id
object_id.to_s #convert to string because tire compares to object_id!
end
def index
item = self
Tire.index INDEX do
store item
end
end
def to_indexed_json
@attributes.to_json
end
ENTITIES = [
new(title: "foo foo foo foo"),
new(title: "foo"),
new(title: "bar"),
new(title: "foo bar"),
new(title: "xxx"),
new(title: "foo foo foo foo"),
new(title: "foo foo"),
new(title: "foo bar baz")
]
QUERIES = {
:foo => { query_string: { query: "foo" } },
:all => { match_all: {} }
}
def self.custom_explained_search(q)
Tire.search(Model::INDEX, :wrapper => Model, :explain => true) do |search|
search.query do |query|
query.send :instance_variable_set, :@value, q
end
end
end
end
class Tire::Results::Collection
def explained
@response["hits"]["hits"].map do |hit|
{
"_id" => hit["_id"],
"_explanation" => hit["_explanation"],
"title" => hit["_source"]["title"]
}
end
end
end
Model.delete_index
Model.create_mapping
Model::ENTITIES.each &:index
Model.refresh_index
s = Model.custom_explained_search(Model::QUERIES[:foo])
pp s.results.explained
打印结果如下:
[{"_id"=>"2169251840",
"_explanation"=>
{"value"=>0.54932046,
"description"=>"fieldWeight(_all:foo in 0), product of:",
"details"=>
[{"value"=>1.4142135,
"description"=>"btq, product of:",
"details"=>
[{"value"=>1.4142135, "description"=>"tf(phraseFreq=2.0)"},
{"value"=>1.0, "description"=>"allPayload(...)"}]},
{"value"=>0.7768564, "description"=>"idf(_all: foo=4)"},
{"value"=>0.5, "description"=>"fieldNorm(field=_all, doc=0)"}]},
"title"=>"foo foo foo foo"},
{"_id"=>"2169251720",
"_explanation"=>
{"value"=>0.54932046,
"description"=>"fieldWeight(_all:foo in 1), product of:",
"details"=>
[{"value"=>0.70710677,
"description"=>"btq, product of:",
"details"=>
[{"value"=>0.70710677, "description"=>"tf(phraseFreq=0.5)"},
{"value"=>1.0, "description"=>"allPayload(...)"}]},
{"value"=>0.7768564, "description"=>"idf(_all: foo=4)"},
{"value"=>1.0, "description"=>"fieldNorm(field=_all, doc=1)"}]},
"title"=>"foo"},
{"_id"=>"2169250520",
"_explanation"=>
{"value"=>0.48553526,
"description"=>"fieldWeight(_all:foo in 2), product of:",
"details"=>
[{"value"=>1.0,
"description"=>"btq, product of:",
"details"=>
[{"value"=>1.0, "description"=>"tf(phraseFreq=1.0)"},
{"value"=>1.0, "description"=>"allPayload(...)"}]},
{"value"=>0.7768564, "description"=>"idf(_all: foo=4)"},
{"value"=>0.625, "description"=>"fieldNorm(field=_all, doc=2)"}]},
"title"=>"foo foo"},
{"_id"=>"2169251320",
"_explanation"=>
{"value"=>0.44194174,
"description"=>"fieldWeight(_all:foo in 1), product of:",
"details"=>
[{"value"=>0.70710677,
"description"=>"btq, product of:",
"details"=>
[{"value"=>0.70710677, "description"=>"tf(phraseFreq=0.5)"},
{"value"=>1.0, "description"=>"allPayload(...)"}]},
{"value"=>1.0, "description"=>"idf(_all: foo=1)"},
{"value"=>0.625, "description"=>"fieldNorm(field=_all, doc=1)"}]},
"title"=>"foo bar"},
{"_id"=>"2169250380",
"_explanation"=>
{"value"=>0.27466023,
"description"=>"fieldWeight(_all:foo in 3), product of:",
"details"=>
[{"value"=>0.70710677,
"description"=>"btq, product of:",
"details"=>
[{"value"=>0.70710677, "description"=>"tf(phraseFreq=0.5)"},
{"value"=>1.0, "description"=>"allPayload(...)"}]},
{"value"=>0.7768564, "description"=>"idf(_all: foo=4)"},
{"value"=>0.5, "description"=>"fieldNorm(field=_all, doc=3)"}]},
"title"=>"foo bar baz"},
{"_id"=>"2169250660",
"_explanation"=>
{"value"=>0.2169777,
"description"=>"fieldWeight(_all:foo in 0), product of:",
"details"=>
[{"value"=>1.4142135,
"description"=>"btq, product of:",
"details"=>
[{"value"=>1.4142135, "description"=>"tf(phraseFreq=2.0)"},
{"value"=>1.0, "description"=>"allPayload(...)"}]},
{"value"=>0.30685282, "description"=>"idf(_all: foo=1)"},
{"value"=>0.5, "description"=>"fieldNorm(field=_all, doc=0)"}]},
"title"=>"foo foo foo foo"}]
我读错了数字吗?或者滥用轮胎?也许只是错过了一些&#34;重新索引整个集合&#34;步骤
答案 0 :(得分:2)
afaik如果没有定义明确的排序字段,则将默认值排序为(变体)tf * idf(http://en.wikipedia.org/wiki/Tf * idf)。
字面意思:术语频率*逆文档频率。
来自维基百科:
期限频率(字词数):给定文档中的字词数只是给定字词在该文档中出现的次数
逆文档频率衡量该术语在所有文档中是常见还是罕见。它是通过将文档总数除以包含该术语的文档数得到的,然后取该商的对数
在这种情况下,排序中的“术语频率”组件最有可能导致“foo foo foo foo”在搜索“foo”时得分高于其他文档
此外,关于你在更改id时所看到的效果:我不确定,但我猜它必须在内部由id
订购ES存储文档(我不确定那...)
如果是这种情况,那么具有相同排序分数的2个文档将根据id作为决胜局进行排序。您当然可以定义多种排序来更改此行为(例如:sort = sorta + desc,sortb + desc。在这种情况下,sortb用作所有在scoreA上得分相同的文档的决胜局)