Question

我只为5个doc聚合推文文本字段并且出现内存错误。

推文文本字段是我数据模型中的嵌套字段，经过分析。索引是150Gb，推文文本字段总数非常大。但是，我使用查询只匹配5个文档。

我在页面上看到一些其他堆栈说Elasticsearch加载该字段的所有数据以进行聚合。现在还是这样吗？

我使用的查询只返回5个文档，它是另一个嵌套字段的过滤器。

汇总是这样的：

"aggs": {
            "nestedAgg": {
                "nested": {
                    "path": "t_tweets"
                },
                "aggs": {
                    "tweetTermsAgg": {
                        "terms": {
                            "field": "t_tweets.text",
                            "size": 200,
                            "exclude": ["t.co", "https", "rt", "to", "a", "for", "of", "in", "http", "and", "is", "on", "your", "you", "via", "with", "how", "by", "at", "this", "are", "from", "i", "that", "be", "about", "what", "about", "can", "more", "my", "have", "an", "out", "our", "will", "we", "why", "do", "get", "up", "as", "just", "if", "or", "so", "the", "has", "it's", "need", "but", "great", "w", "best", "its", "no", "when", "not", "been", "than", "there", "was", "some", "don't", "most", "these", "."]
                        }
                    }
                }
            }
        }

推文嵌套对象的映射是

"t_tweets": {
        "type": "nested",
        "properties": {
          "created_utc": {
            "type": "date",
            "format": "strict_date_optional_time||epoch_millis"
          },
          "hashtags": {
            "type": "string"
          },
          "id": {
            "type": "double",
            "doc_values": false
          },
          "keep_forever": {
            "type": "boolean",
            "doc_values": false
          },
          "language": {
            "type": "string",
            "index": "not_analyzed",
            "doc_values": false
          },
          "mentions": {
            "type": "string"
          },
          "text": {
            "type": "string"
          }
        }
      },

我的聚合有什么问题吗？或者无论如何我可以解决这个问题？

Elasticsearch嵌套聚合内存甚至只有5个文档

0 个答案: