我的搜索字符串为if (function_exists('finfo_open')) {
$mime = finfo_open(FILEINFO_MIME_TYPE);
$mime_type = finfo_file($mime, "FILE-PATH");
if($mime_type == array("application/pdf", "image/jpeg", "image/png"))
echo "file is pdf";
else
echo "file is not pdf";
finfo_close($mime);
}
,当前的搜索结果包括:
Resta
原因是由于我的索引:
"Save at any restaurant!",
"Save at any gas station!"
当我用{
"rewards": {
"aliases": {},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"name": {
"type": "text",
"analyzer": "ngram_analyzer"
}
}
},
}
}
},
"settings": {
"index": {
"number_of_shards": "5",
"provided_name": "rewards",
"creation_date": "1555542654894",
"analysis": {
"filter": {
"ngram_filter": {
"type": "ngram",
"min_gram": "2",
"max_gram": "20"
}
},
"analyzer": {
"ngram_analyzer": {
"filter": [
"lowercase",
"ngram_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "Nzf6KNHkQIeKP0HbVFK1lw",
"version": {
"created": "6060299"
}
}
}
}
}
来肯定地查看文档时,我将Save at any gas station!
视为ngram。
sta
(为简洁起见,我省略了许多其他内容)
使用的查询:
{
"_index": "rewards",
"_type": "_doc",
"_id": "6",
"_version": 1,
"found": true,
"took": 0,
"term_vectors": {
"name": {
"field_statistics": {
"sum_doc_freq": 73,
"doc_count": 3,
"sum_ttf": 73
},
"terms": {
"any": {
"term_freq": 1,
"tokens": [
{
"position": 2,
"start_offset": 8,
"end_offset": 11
}
]
},
"save": {
"term_freq": 1,
"tokens": [
{
"position": 0,
"start_offset": 0,
"end_offset": 4
}
]
},
"sta": {
"term_freq": 1,
"tokens": [
{
"position": 4,
"start_offset": 16,
"end_offset": 23
}
]
},
}
}
}
}
搜索时我得到一个分数
{
"bool": {
"should": [
{
"multi_match": {
"query": "restaurant",
"fields": [
"name",
"category",
],
"operator": "and"
}
}
]
}
}
这里的用户实际上正在寻找["Save at any restaurant!", 1.1967528]
["Save at any gas station!", 0.7141209]
,我想知道如何按分数过滤或排除结果。我似乎找不到很好的分数定义(似乎是相对的),但是如何(最终)不显示Restaurant
。
即使给它一个完整的搜索词组Save at any gas station!
,分数也只会好一点:
restaurant
答案 0 :(得分:1)
您只需在映射中创建一个Edge-Ngram分析器,并在搜索请求中仅使用此唯一的。
ngram的作用是仅使用单词的开头字母创建以下标记。
例如re, res, rest, resta, restau, restaur, restaura, restauran, restaurant
我添加了一个边缘n-gram分析器,并注意到我在任何字段中都不使用该分析器。在搜索查询期间,我将仅使用此分析器。
这意味着它将仅以倒排索引搜索餐厅的上述令牌。
下面是一个示例映射及其查询。
PUT <your_index_name>
{
"mappings":{
"mydocs":{
"properties":{
"name":{
"type":"text",
"fields":{
"name":{
"type":"text",
"analyzer":"ngram_analyzer"
}
}
}
}
}
},
"settings":{
"index":{
"number_of_shards":"5",
"analysis":{
"filter":{
"ngram_filter":{
"type":"ngram",
"min_gram":"2",
"max_gram":"20"
},
"edgengram_filter":{
"type":"edge_ngram",
"min_gram":"2",
"max_gram":"20"
}
},
"analyzer":{
"ngram_analyzer":{
"filter":[
"lowercase",
"ngram_filter"
],
"type":"custom",
"tokenizer":"standard"
},
"edgengram_analyzer":{
"filter":[
"lowercase",
"edgengram_filter"
],
"type":"custom",
"tokenizer":"standard"
}
}
},
"number_of_replicas":"1"
}
}
}
下面是查询的样子:
POST <your_index_name>/_search
{
"query":{
"bool":{
"should":[
{
"multi_match":{
"query":"restaurant",
"fields":[
"name",
"category"
],
"operator":"and",
"analyzer":"edgengram_analyzer" <---- Added this
}
}
]
}
}
}
您将能够看到所需的结果。
希望有帮助。