Elasticsearch不会回复大量数据

时间:2014-05-24 11:11:04

标签: elasticsearch

我正在使用centos5,我使用 -Xms808m -Xmx808m -Xss256k 参数在1.0.0版本上运行elasticsearch。共有17个索引和30200583个文档。每个索引的文档都在1000000到2000000之间。我创建请求查询(每个索引都有日期字段);

{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "date": {
              "to": "2014-06-01 14:14:00",
              "from": "2014-04-01 00:00:00"
            }
          }
        }
      ],
      "should": [],
      "must_not": [],
      "minimum_number_should_match": 1
    }
  },
  "from": 0,
  "size": "50"
}

它给予回应;

{
   took: 5903
   timed_out: false
   _shards: {
      total: 17
      successful: 17
      failed: 0
   },
   hits: {
   total: 30200583
...
...
...}

然而,当我在elasticsearch-head工具上发送最后50行的查询时,如;

{
  ...
  ...
  ...
  "from": 30200533,
  "size": "50"
}

它没有给出响应并抛出异常,如;

ava.lang.OutOfMemoryError: Java heap space
        at org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:247)
        at org.apache.lucene.store.Directory.copy(Directory.java:186)
        at org.elasticsearch.index.store.Store$StoreDirectory.copy(Store.java:348)
        at org.apache.lucene.store.TrackingDirectoryWrapper.copy(TrackingDirectoryWrapper.java:50)
        at org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4596)
        at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:535)
        at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:502)
        at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:506)
        at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:616)
        at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:370)
        at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:285)
        at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:260)
        at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:250)
        at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:170)
        at org.apache.lucene.search.XSearcherManager.refreshIfNeeded(XSearcherManager.java:123)
        at org.apache.lucene.search.XSearcherManager.refreshIfNeeded(XSearcherManager.java:59)
        at org.apache.lucene.search.XReferenceManager.doMaybeRefresh(XReferenceManager.java:180)
        at org.apache.lucene.search.XReferenceManager.maybeRefresh(XReferenceManager.java:229)
        at org.elasticsearch.index.engine.internal.InternalEngine.refresh(InternalEngine.java:730)
        at org.elasticsearch.index.shard.service.InternalIndexShard.refresh(InternalIndexShard.java:477)
        at org.elasticsearch.index.shard.service.InternalIndexShard$EngineRefresher$1.run(InternalIndexShard.java:924)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

有什么问题?是不是足够的Java堆空间或我的查询导致此堆空间错误?

1 个答案:

答案 0 :(得分:2)

这两个问题的答案是"是"。您没有足够的堆空间来查看错误,并且查询导致错误,因为您没有足够的堆空间。

原因是因为分类,深度分页非常昂贵。要检索第20个元素,您需要将元素1-20保留在内存中并进行排序。要检索第1,000,000个元素,您需要将元素1-999,999保留在内存中并进行排序。

这通常需要相当多的记忆。

有几个选择:

  • 获得更多记忆。问题解决了
  • 使用scan/scroll代替普通搜索。扫描/滚动不执行评分,因此不需要维护排序顺序,这使得它非常节省内存
  • 使用不同的排序条件(例如反向排序)或较小的窗口(例如较小的日期范围,以便您可以翻页到最后)