Question

我为我的ES实施了一个分析器，我试图理解多个字段，每个语言分析一次文本，因为我的文档可以有多种语言。

"properties": {
    "body" : {
        "type": "string",
        "fields": {
            "fr": {
                "type": "string",
                "analyzer": "french"
            },
            "en": {
                "type": "string",
                "analyzer": "english"
            },
            "es": {
                "type": "string",
                "analyzer": "spanish"
            },
            "de": {
                "type": "string",
                "analyzer": "german"
            },
            "pt": {
                "type": "string",
                "analyzer": "portuguese"
            },
            "nl": {
                "type": "string",
                "analyzer": "dutch"
            },
            "dk": {
                "type": "string",
                "analyzer": "danish"
}}}}

这是如何运作的？ ES docs中没有进一步的信息。

例如，如果我尝试索引字符串Hello, my car is red，分析器会检测所有语言，直到它用英语检测到它，因此它使用english分析器创建倒排索引？

提前致谢。

Answer 1

分析师的工作只是分析 - 他们无法检测任何事情。

multi field type允许以不同方式分析相同的值。在您的示例中，在您指定的分析器分析之后，您在正文字段中传递的值将存储在每个 sub 字段中。

举个简单的例子，想象一下你有两个分析器

  "analyzer":{
    "standard":{
      "type":"custom",
      "tokenizer":"standard",
      "filter":[
        "lowercase"
      ]
    },
    "low-keyword": {
      "type": "custom",
      "tokenizer": "keyword"
    }
  }

您可以指定类似于您提供的地图

"properties": {
    "body" : {
        "type": "string",
        "analyzer": "standard"
        "fields": {
            "keyword": {
                "type": "string",
                "analyzer": "keyword"
            }
          }
        }
      }

这样当索引“这是一个测试”时，文本将被存储为

body: "this", "is", "a", "test"
body.keyword: "This is a Test"

Elasticsearch中没有可以检测语言并相应地分析文本的功能。如果您还没有看过，definitive guide中有关于语言的完整章节。

了解多字段以按语言分析文本一次

1 个答案: