Question

我正面临有关性能的问题。我的应用程序是关于聊天。

我设计了带有嵌套对象的映射索引，如下所示。

{
  "conversation_id-v1": {
    "mappings": {
      "stream": {
        "properties": {
          "id": {
            "type": "keyword"
          },
          "message": {
            "type": "text",
            "fields": {
              "analyzerName": {
                "type": "text",
                "term_vector": "with_positions_offsets",
                "analyzer": "analyzerName"
              },
              "language": {
                "type": "langdetect",
                "analyzer": "_keyword",
                languages: ["en", "ko", "ja"]
              }
            }
          },
          "comments": {
            "type": "nested",
            "properties": {
            "id": {
              "type": "keyword"
            },
            "message": {
              "type": "text",
              "fields": {
                "analyzerName": {
                  "type": "text",
                  "term_vector": "with_positions_offsets",
                  "analyzer": "analyzerName"
                },
                "language": {
                  "type": "langdetect",
                  "analyzer": "_keyword",
                  languages: ["en", "ko", "ja"]
                }
              }
            }
            }
          }
        }
      }
    }
  }
}

**实际上有很多字段

一个文档包含大约4,000个嵌套对象。当我将数据上传到文档中时，它的cpu峰值也会达到100％，以防万一写入磁盘。输入比率约为1000 / s。

如何调整以提高性能？

硬件

3个GCP上的2vCPU 13GB

Answer 1

4000个嵌套字段听起来很多-如果我是你，我会花很多时间在您的映射设计上，以确保您确实需要那么多嵌套字段。

引用docs：

Internally, nested objects index each object in the array as a separate hidden document.

由于必须在更新时对文档进行完全重新索引编制，因此您只需一次更新即可为4000个文档编制索引。

为什么有那么多字段？

您在注释中给出的原因是需要很多字段

I'd like to search comments in nested and come with their parent stream for display.

让我认为您可能在这里混淆了两个问题。

ElasticSearch是用于搜索的，您的映射应针对搜索进行优化。如果您的映射形状是由您想要显示信息的方式决定的，那么这是错误的。

围绕搜索设计索引

请注意，“搜索”是指索引编制和查询。

对于您的用例，似乎可以：

仅对注释建立索引，并在索引的comment文档中引用父流。
从搜索索引中获得搜索结果（评论列表）后，您可以从其他数据源（例如关系数据库）中检索每个评论及其父流。

重点是，重新检索注释以及从其他来源获得的其他想要的内容可能要更有效，这比连接数据时的ElasticSearch更好。

部分更新为大文档

1 个答案: