Question

我在下面的索引python文件
应该有7个匹配项基于实际数据与查询匹配，但仍会产生10个结果。因为默认大小参数是10 有什么方法可以使它产生的点击数与大小无关？还是我必须一直预计大小并将其始终放在查询中？

结果：

可能与我如何编制索引有关？ idk为什么总点击数是26639。它应该匹配7。

from elasticsearch import  helpers, Elasticsearch
from datetime import datetime
import csv
import json

es = Elasticsearch()

with open('result.csv', encoding='utf-8') as f:
    reader = csv.DictReader(f)
    helpers.bulk(es, reader, index='hscate', doc_type='my-type')

res = es.search(index = 'hscate',
            doc_type = 'my-type',
           # size ='1000',
            #from_=0,
                body = {
                'query': {     
                    'match' : {
                         'name' : '추성훈의 코몽트 기모본딩바지 3+1종_총 4종'
                    }
                }
            })
print(len(res['hits']['hits']))
with open('mycsvfile.csv', 'w',encoding='utf-8',newline='') as f:  # Just use 'w' mode in 3.x
    header_present  = False
    for doc in res['hits']['hits']:
        my_dict = doc['_source'] 
        if not header_present:
            w = csv.DictWriter(f, my_dict.keys())
            w.writeheader()
            header_present = True
        w.writerow(my_dict)

Answer 1

您怀疑，我认为elasticsearch会根据在数据中匹配它们时所产生的排名，为您简单地产生10个结果。

尝试一下：

body = {
    'from': 0,
    'size': 1,
    'query': {
        'bool': {
            'must': [
                {
                    'match': {
                        'Category' : 'category name',
                    }
                },
                {
                    'match' : {
                        'name' : 'product name'
                    }
                }
            ]
        }
    }
}

来源：https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-from-size.html

Answer 2

基于我们作为评论进行的讨论，我现在可以理解您的意思和实际问题所在。

当您在Elasticsearch中使用默认值时，elasticsearch会使用standard analyzer分析文本，该文本基本上会将您的文本拆分为标记。当您使用match query在该字段中搜索时，将应用相同的分析过程。这意味着您的查询文本也分为令牌。 match查询对所有生成的令牌运行“或”。

您可以在Kibana开发人员控制台中复制和粘贴的以下示例显示：

DELETE test
PUT test 
PUT test/_doc/1
{
  "name": "추성훈의 코몽트 기모본딩바지 3+1종_총 4종"
}
PUT test/_doc/2
{
  "name": "추성훈의 기모본딩바지 4종"
}
GET test/_search
{
  "query": {
    "match": {
      "name": "추성훈의 코몽트 기모본딩바지 3+1종_총 4종"
    }
  }
}

它给出以下结果：

{
  "took" : 12,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 1.7260926,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.7260926,
        "_source" : {
          "name" : "추성훈의 코몽트 기모본딩바지 3+1종_총 4종"
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.8630463,
        "_source" : {
          "name" : "추성훈의 기모본딩바지 4종"
        }
      }
    ]
  }
}

如果您未在索引设置中定义任何分析器，则elasticsearch可能会生成一个.keyword子字段，该子字段不会被分析。您可以像这样查询它：

GET test/_search
{
  "query": {
    "term": {
      "name.keyword": "추성훈의 코몽트 기모본딩바지 3+1종_총 4종"
    }
  }
}

现在仅提供完全匹配项：

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.2876821,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.2876821,
        "_source" : {
          "name" : "추성훈의 코몽트 기모본딩바지 3+1종_총 4종"
        }
      }
    ]
  }
}

如果您知道将不再运行全文本搜索，而只需要完全匹配，而无需在name字段上进行汇总或排序，则可以按以下方式定义索引：< / p>

DELETE test
PUT test 
{
  "mappings": {
    "_doc": {
      "properties": {
        "name": {
          "type": "text",
          "analyzer": "keyword"
        }
      }
    }
  }
}
PUT test/_doc/1
{
  "name": "추성훈의 코몽트 기모본딩바지 3+1종_총 4종"
}
PUT test/_doc/2
{
  "name": "추성훈의 기모본딩바지 4종"
}
GET test/_search
{
  "query": {
    "term": {
      "name": "추성훈의 코몽트 기모본딩바지 3+1종_총 4종"
    }
  }
}

这还给出一个结果，并且比默认行为需要更少的磁盘空间。

elasticsearch查询返回的结果不单单点击

2 个答案: