elasticsearch批量处理器故障

时间:2017-11-15 09:13:22

标签: java elasticsearch bigdata

我想将一大批索引请求索引到60个不同的索引(大约1000万个,索引的名称就像boss-log-yyyy-MM-dd)。

这是我的java代码:

List<IndexRequest> indexRequesList = bossMockDataService.indexRequestGenerator(batch); //generate random mock data.
    indexRequesList.forEach(indexRequest -> {
        bulkProcessor.add(indexRequest);
    });
    try {
       return bulkProcessor.awaitClose(30L, TimeUnit.SECONDS);
    } catch (InterruptedException e) {
        e.printStackTrace();
        return false;
    }

这是我的散装处理器:

@Override
public BulkProcessor initESbulkProcessor(RestHighLevelClient client) {
    Settings settings = Settings.EMPTY;
    ThreadPool threadPool = new ThreadPool(settings); // I have no Idea what the 'settings' object is.
    BulkProcessor.Listener listener = new BulkProcessor.Listener() {
        @Override
        public void beforeBulk(long executionId, BulkRequest request) {
            int numberOfActions = request.numberOfActions();
            logger.debug("Executing bulk [{}] with {} requests", executionId, numberOfActions);
        }

        @Override
        public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {

            if (response.hasFailures()) {
                logger.warn("Bulk [{}] executed with failures", executionId);
            } else {
                logger.debug("Bulk [{}] completed in {} milliseconds", executionId, response.getTook().getMillis());
            }
        }

        @Override
        public void afterBulk(long executionId, BulkRequest request, Throwable failure) {

            logger.error("Failed to execute bulk", failure);
        }

    };
    BulkProcessor.Builder builder = new BulkProcessor.Builder(client::bulkAsync, listener, threadPool);
    BulkProcessor bulkProcessor = builder.build();
    builder.setBulkActions(2000); 
    builder.setBulkSize(new ByteSizeValue(10L, ByteSizeUnit.MB));
    builder.setConcurrentRequests(10);
    builder.setFlushInterval(TimeValue.timeValueSeconds(10L));
    builder.setBackoffPolicy(BackoffPolicy.constantBackoff(TimeValue.timeValueSeconds(1L), 3));
    logger.info("bulk processor build complete!");
    return bulkProcessor;

}

似乎某些数据索引请求失败。日志如下:

2017-11-15 16:42:24 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [96] executed with failures
2017-11-15 16:42:24 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [97] executed with failures
2017-11-15 16:42:24 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [98] executed with failures
2017-11-15 16:42:24 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [99] executed with failures
2017-11-15 16:42:24 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [100] executed with failures
2017-11-15 16:42:24 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [101] executed with failures
2017-11-15 16:42:24 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [102] executed with failures
2017-11-15 16:42:24 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [103] executed with failures
2017-11-15 16:42:25 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [104] executed with failures
2017-11-15 16:42:25 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [105] executed with failures
2017-11-15 16:42:25 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [106] executed with failures
2017-11-15 16:42:25 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [107] executed with failures
2017-11-15 16:42:25 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [108] executed with failures
2017-11-15 16:42:25 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [109] executed with failures
2017-11-15 16:42:25 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [110] executed with failures
2017-11-15 16:42:26 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [111] executed with failures
2017-11-15 16:42:26 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [112] executed with failures
2017-11-15 16:42:26 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [113] executed with failures
2017-11-15 16:42:26 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [114] executed with failures
2017-11-15 16:42:26 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [115] executed with failures
2017-11-15 16:42:26 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [116] executed with failures
2017-11-15 16:42:26 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [117] executed with failures
2017-11-15 16:42:27 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [118] executed with failures
2017-11-15 16:42:27 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [119] executed with failures
2017-11-15 16:42:27 [I/O dispatcher 1] WARN  BossMockDataServiceImpl:220 - Bulk [120] executed with failures

我的情况:

  • ES的新手。
  • 每天需要存储和查询100万条数据。
  • 但我还没有配置我的shard / replica / node / cluster,它们现在是默认的。
  • 当我使用MySQL时,我会为一天的数据创建一个新分区。
  • MySQL的限制是60天.IO操作变得越来越慢。

我的目标:

  • 轻松维护我的ES。
  • 快速查询,更新和聚合。
  • 存储60天以上的数据。
  • 不允许批量请求失败。(似乎上面丢失了一些数据)

我想知道的是:

  • 为一天的数据创建新索引是个好主意吗?或者我把它们放在一个索引中?

  • 为什么批量处理器回调失败?如何解决?

  • 是否需要创建更多群集/节点?我应该从哪里开始?

0 个答案:

没有答案