Question

我有一个名为test的索引，它可以与名为n到sub_test_1的{{1}}个文档类型相关联。但是所有人都会有相同的映射。

有没有办法制作索引，这样所有文档类型的文档都有相同的映射？即sub_text_n应与test\sub_text1\_mapping相同。

否则，如果我有test\sub_text2\_mapping个文档类型，我会在每个文档类型中有1000个相同类型的映射。

更新：

预期映射：

PUT /test_index/
{
  "settings": {
    "index.store.type": "default",
    "index": {
        "number_of_shards": 5,
        "number_of_replicas": 1,
        "refresh_interval": "60s"
    },
    "analysis": {
        "filter": {
            "porter_stemmer_en_EN": {
                "type": "stemmer",
                "name": "porter"
            },
            "default_stop_name_en_EN": {
                "type": "stop",
                "name": "_english_"
            },
            "snowball_stop_words_en_EN": {
                "type": "stop",
                "stopwords_path": "snowball.stop"
            },
            "smart_stop_words_en_EN": {
                "type": "stop",
                "stopwords_path": "smart.stop"
            },
            "shingle_filter_en_EN": {
                "type": "shingle",
                "min_shingle_size": "2",
                "max_shingle_size": "2",
                "output_unigrams": true
            }
        }
    }
  }
}

我希望此映射为我创建的所有{ "sub_text" : { "properties" : { "_id" : { "include_in_all" : false, "type" : "string", "store" : true, "index" : "not_analyzed" }, "alternate_id" : { "include_in_all" : false, "type" : "string", "store" : true, "index" : "not_analyzed" }, "text" : { "type" : "multi_field", "fields" : { "text" : { "type" : "string", "store" : true, "index" : "analyzed", }, "pdf": { "type" : "attachment", "fields" : { "pdf" : { "type" : "string", "store" : true, "index" : "analyzed", } } } } } } } }的单独映射，以便我可以针对一个sub_text更改它，而不会影响其他sub_text。我可能想要向sub_text1添加两个自定义分析器，向sub_text3添加三个分析器，其他分析器保持不变。

更新：

PUT /my-index/document_set/_mapping
{
  "properties": {
    "type": {
      "type": "string",
      "index": "not_analyzed"
    },
    "doc_id": {
      "type": "string",
      "index": "not_analyzed"
    },
    "plain_text": {
      "type": "string",
      "store": true,
      "index": "analyzed"
    },
    "pdf_text": {
      "type": "attachment",
      "fields": {
        "pdf_text": {
          "type": "string",
          "store": true,
          "index": "analyzed"
        }
      }
    }
  }
}

POST /my-index/document_set/1
{
  "type": "d1",
  "doc_id": "1",
  "plain_text": "simple text for doc1."
}

POST /my-index/document_set/2
{
  "type": "d1",
  "doc_id": "2",
  "pdf_text": "cGRmIHRleHQgaXMgaGVyZS4="
}

POST /my-index/document_set/3
{
  "type": "d2",
  "doc_id": "3",
  "plain_text": "simple text for doc3 in d2."
}

POST /my-index/document_set/4
{
  "type": "d2",
  "doc_id": "4",
  "pdf_text": "cGRmIHRleHQgaXMgaGVyZSBpbiBkMi4="
}

GET /my-index/document_set/_search
{
  "query" : {
    "filtered" : {
      "filter" : {
        "term" : {
          "type" : "d1"
        }
      }
    }
  }
}

这给了我与＃34; d1＆＃34;类型相关的文件。如何仅将分析器添加到类型＆＃34; d1＆＃34;？

的文档中

Answer 1

目前可能的解决方案是使用index templates或dynamic mapping。但是它们不允许使用通配符类型匹配，因此您必须使用_default_根类型将映射应用于索引中的所有类型，因此您可以确保可以将所有类型应用于相同的动态映射。此模板示例可能适合您：

curl -XPUT localhost:9200/_template/template_1 -d '
{
    "template" : "test",
    "mappings" : {
        "_default_" : {
            "dynamic": true,
            "properties": {
                "field1": {
                   "type": "string",
                   "index": "not_analyzed"
                }
            }
        }
    }
}
'

Answer 2

不要这样做。

否则如果我有1000个文档类型，我将在每个文档类型中有1000个相同类型的映射。

你完全正确。对于具有相同映射的每个额外_type ，您将不必要地添加到索引映射的大小。它们不会被合并，也不会有任何压缩。

更好的解决方案是简单地创建共享_type并创建表示预期类型的字段。这完全避免了浪费的映射和与之相关的所有负面因素，包括不必要地增加群集状态的大小。

从那里，您可以模仿Elasticsearch为您做的事情并过滤您的自定义类型，而不会使您的映射膨胀。

$ curl -XPUT localhost:9200/my-index -d '{ "mappings" : { "my-type" : { "properties" : { "type" : { "type" : "string", "index" : "not_analyzed" }, # ... whatever other mappings exist ... } } } }'

然后，对于针对sub_text1（等）的任何搜索，您可以执行term（对于一个）或terms（对于多个）过滤器来模仿{ {1}}过滤器会发生在您身上。

_type

这与$ curl -XGET localhost:9200/my-index/my-type/_search -d '{ "query" : { "filtered" : { "filter" : { "term" : { "type" : "sub_text1" } } } } }'过滤器的功能相同，您可以创建包含过滤器的_type es，如果您想拥有更高级别的搜索功能而不暴露客户端级别的过滤逻辑。

Elasticseach中具有相同映射的多个文档类型

2 个答案: