我有一个PHP脚本,该脚本针对Elasticsearch运行两个查询,并在PHP / HTML页面上回显结果。这两个查询在相同的字段中搜索相同的文本,但是一个查询具有AND运算符,另一个查询使用OR运算符。
我从AND运算符收到的结果是我想首先出现的结果。 OR运算符的结果也应出现,但应出现在第一个结果之后。这似乎与脚本的当前状态无关。
脚本:
<?php
require_once 'vendor/autoload.php';
use Elasticsearch\ClientBuilder;
$client = ClientBuilder::create()->setHosts(['REDACTED:9200'])->build();
$es = $client;
if (isset($_GET['q'])) {
$q = $_GET['q'];
$query = $es->search([
'index' => 'rss',
'size' => '30',
'body' => [
'query' => [
'simple_query_string' => [
'fields' => ["message","title"],
'query' => "$q",
'default_operator' => 'and',
'minimum_should_match' => '100%'
],
'simple_query_string' => [
'fields' => ["message","title"],
'query' => "$q",
'default_operator' => 'or',
'minimum_should_match' => '80%'
]
]
]
]);
}
if($query['hits']['max_score'] >=1 ) {
$results = $query['hits']['hits'];
}
?>
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<title>Søkemotor</title>
<link rel="stylesheet" href="css/main.css">
</head>
<body>
<div class="img">
<img src="img/DigRevLogo3.png" alt="Logo" width="200" height="50" class="img">
</div>
<div class="search">
<form action="index.php" method="get" autocomplete="off" class="search_form">
<label><input type="text" name="q" placeholder="Søk her"></label>
<label><input type="submit" value="Søk" name="s"></label>
</form>
</div>
<?php
$noresult = "Ingen resultat på søket av $q.";
$i = 0;
if(isset($results)) {
foreach($results as $r) { ?>
<div class="result">
<div class="title">
<a href="<?php echo $r['_source']['link']; ?>"><?php echo $r['_source']['title'];?></a>
</div>
<div class="message">
<br>
<?php echo $r['_source']['message'];?>
</div>
<div class="published">
<br>
<?php echo $r['_source']['published'];?>
</div>
</div>
<div class="noresult">
<?php
}
}
else echo "<CENTER>$noresult</CENTER>"; ?>
</div>
</body>
</html>
如果我的查询是“ Apple Orange”,我的结果现在显示如下:
RESULT 1: Apple Apple
RESULT 2: Apple Orange
RESULT 3: Apple Apple Apple
RESULT 4: Orange
我想出现的是这样的:
RESULT 1: Apple Orange
RESULT 2: Apple Apple Apple
RESULT 3: Apple Apple
RESULT 4: Orange
我该怎么做?我正在使用Debian 9上安装的Elasticsearch6.3。PHP版本是7.2。我将提供是否还有其他有用的信息,但我不确定需要什么。
答案 0 :(得分:0)
为简化起见,让我们将其简化为Elasticsearch查询并将其切换到match
,这通常是一开始要查询的查询,然后根据需要进行更深入的研究:
DELETE fruit
PUT fruit
{
"settings": {
"number_of_shards": 1
}
}
POST fruit/_doc
{
"fruit": "Apple Apple"
}
POST fruit/_doc
{
"fruit": "Apple Orange"
}
POST fruit/_doc
{
"fruit": "Apple Apple Apple"
}
POST fruit/_doc
{
"fruit": "Orange"
}
GET fruit/_search
{
"query": {
"match": {
"fruit": "Apple Orange"
}
}
}
结果是:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1.0498221,
"hits": [
{
"_index": "fruit",
"_type": "_doc",
"_id": "oRg6HmUBs4EUCKS4dujJ",
"_score": 1.0498221,
"_source": {
"fruit": "Apple Orange"
}
},
{
"_index": "fruit",
"_type": "_doc",
"_id": "oxg6HmUBs4EUCKS4d-hu",
"_score": 0.87138504,
"_source": {
"fruit": "Orange"
}
},
{
"_index": "fruit",
"_type": "_doc",
"_id": "ohg6HmUBs4EUCKS4d-ga",
"_score": 0.5062483,
"_source": {
"fruit": "Apple Apple Apple"
}
},
{
"_index": "fruit",
"_type": "_doc",
"_id": "oBg6HmUBs4EUCKS4duh-",
"_score": 0.49042806,
"_source": {
"fruit": "Apple Apple"
}
}
]
}
}
对于一般理解,分数是由BM25计算的(与旧的TF / IDF非常相似)。为什么我们得到这个特定的订单?
如果您向查询中添加explain
,它将实际上向您显示所有分数的计算方式:
GET fruit/_search
{
"explain": true,
"query": {
"match": {
"fruit": "Apple Orange"
}
}
}
如何更改默认行为?您可以调整BM25中的一些参数。阅读blog post series on BM25,这里描述了很多概念。但是请注意,这已经是一个相当高级的主题。