理论

Question

我试图弄清楚弹性搜索索引的概念并且完全不了解它。我想提前几点。我理解逆文档索引是如何工作的（将术语映射到文档ID），我也理解文档排名如何基于TF-IDF工作。我不了解的是实际索引的数据结构。在引用弹性搜索文档时，它将索引描述为＆＃34;表，其中包含对文档的映射＆＃34;。所以，这里有分片!!当您查看弹性搜索索引的典型图片时，它表示如下：图片没有显示实际分区是如何发生的以及如何[表 - ＆gt;文档]链接分为多个分片。例如，每个分片是否垂直分割表格？意味着反向索引表仅包含分片上存在的术语。例如，假设我们有3个分片，意味着第一个分片将包含document1，第二个分片只包含文档2，第三个分片是文档3.现在，第一个分片索引是否只包含document1中存在的术语？在这种情况下[蓝色，明亮，蝴蝶，微风，挂起]。如果是这样，如果有人搜索[忘记]，弹性搜索＆＃34;如何知道＆＃34;不搜索分片1，或者每次搜索所有分片？当您查看群集图像时：

目前尚不清楚shard1，shard2和shard3究竟是什么。我们从Term - ＆gt;开始DocumentId - ＆gt;记录到＆＃34;矩形＆＃34;碎片，但碎片究竟包含什么？

如果有人可以从上面的图片中解释，我将不胜感激。

Answer 1

理论

Elastichsarch建立在Lucene之上。每个分片只是一个Lucene索引。 Lucene索引，如果简化，则是倒排索引。每个Elasticsearch索引都是一堆碎片或Lucene索引。当您查询查找文档时，Elasticsearch将子查询所有分片，合并结果并将其返回给您。当您索引文档到Elasticsearch时，Elasticsearch将使用公式计算应在哪个分片文档中写入

shard = hash(routing) % number_of_primary_shards

默认情况下，Elasticsearch使用文档id作为路由。如果您指定routing param，则会使用它而不是id。您可以在搜索查询和索引，删除或更新文档的请求中使用routing参数。默认情况下，使用哈希函数MurmurHash3

实施例

使用3个分片创建索引

$ curl -XPUT localhost:9200/so -d '
{ 
    "settings" : { 
        "index" : { 
            "number_of_shards" : 3, 
            "number_of_replicas" : 0 
        } 
    } 
}'

索引文件

$ curl -XPUT localhost:9200/so/question/1 -d '
{ 
    "number" : 47011047, 
    "title" : "need elasticsearch index sharding explanation" 
}'

无路由查询

$ curl "localhost:9200/so/question/_search?&pretty"

响应

查看_shards.total - 这是一些被查询的分片。另请注意，我们找到了文档

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "so",
        "_type" : "question",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "number" : 47011047,
          "title" : "need elasticsearch index sharding explanation"
        }
      }
    ]
  }
}

使用正确的路由进行查询

$ curl "localhost:9200/so/question/_search?explain=true&routing=1&pretty"

响应

_shards.total现在1，因为我们指定路由和elasticsearch知道哪个分片要求提供文档。使用param explain=true，我要求elasticsearch为我提供有关查询的其他信息。请注意hits._shard - 它已设置为[so][2]。这意味着我们的文档存储在so索引的第二个分片中。

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_shard" : "[so][2]",
        "_node" : "2skA6yiPSVOInMX0ZsD91Q",
        "_index" : "so",
        "_type" : "question",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "number" : 47011047,
          "title" : "need elasticsearch index sharding explanation"
        },
        ...
}

使用不正确的路由查询

$ curl "localhost:9200/so/question/_search?explain=true&routing=2&pretty"

响应

_shards.total再次1.但是Elasticsearch没有向我们的查询返回任何内容，因为我们指定了错误的路由，Elasticsearch查询了没有文档的分片。

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

需要弹性搜索索引分片解释

1 个答案:

理论

实施例

使用3个分片创建索引

索引文件

无路由查询

响应

使用正确的路由进行查询

响应

使用不正确的路由查询

响应

其他信息