Question

我们正在尝试在四个文本字段上构建WordCloud。每个字段都有自己的停止分析器。例如，带有法语停止分析器的TextFr，带有德语停止分析器的TextDe。分析的结果应复制到另一个名为WordCloudText的字段上，在该字段上进行聚合。您对如何执行此操作有任何建议吗？这有可能吗？

感谢您的帮助

Answer 1

我认为没有一种方法可以复制字段的分析输出，仅复制字段的值（未分析）。实现此目的最简单的方法可能是定义自己的分析器，以过滤所有四种语言。像这样：

PUT stackoverflow
{
  "settings": {
    "analysis": {
      "filter": {
        "english_stop": {
          "type": "stop",
          "stopwords": "_english_" 
        },
        "dutch_stop": {
          "type": "stop",
          "stopwords": "_dutch_"
        }
      },
      "analyzer": {
        "eng_stop": {
          "type": "stop",
          "stopwords": "_english_"
        },
        "dutch_stop": {
          "type": "stop",
          "stopwords": "_dutch_"
        },
        "all_lang_stop": {
          "tokenizer": "lowercase",
          "filter": [
            "english_stop",
            "dutch_stop"
          ]
        }
      }
    }
  },
  "mappings": {
    "record": {
      "properties": {
        "field": {
          "type": "keyword",
          "fields": {
            "english": {"type": "text", "analyzer": "eng_stop" },
            "dutch": {"type": "text", "analyzer": "dutch_stop" },
            "word_cloud": {"type": "text", "analyzer": "all_lang_stop"}
          }
        }
      }
    }
  }
}

键是名为all_lang_stop的自定义分析器，它结合了多个顶部过滤器。然后，您可以使用多字段将数据自动复制到每种类型的停止分析器中。

或者，如果您的文本已按语言分为不同的字段，则可以在每个单独的语言字段上使用copy_to指令将其复制到word_cloud字段中。请注意，copy_to复制了分析仪的输入值，而不是输出值，因此您仍然需要组合的分析仪。像这样：

"mappings": {
    "record": {
      "properties": {
        "english": {"type": "text", "analyzer": "eng_stop", copy_to: "word_cloud"},
        "dutch": {"type": "text", "analyzer": "dutch_stop", copy_to: "word_cloud"},
        "word_cloud": {"type": "text", "analyzer": "all_lang_stop"}
      }
    }
  }

将分析的文本复制到另一个字段

1 个答案: