使用BulkProcessor时Elasticsearch中缺少文档

时间:2016-06-06 12:28:28

标签: elasticsearch apache-kafka

我使用0将数据推送到kafka-topic x ,并使用[java] kafka-producer从主题 x 中读取数据索引数据到elasticsearch。制作人每次都会推送10个文档。当我在运行生产者后第一次启动bulkProcessor的java代码时,我看到只有9条记录被推送到ES,全部都是[java] high level consumer/bulkProcessor。第10条记录不在ES中。

但不知何故,"_version": 1beforeBulk()方法显示了以下结果。

afterBulk()

这一刻起,如果我删除了elasticsearch索引并使用了生产者,我会始终看到10条记录。我不知道为什么会这样。任何帮助表示赞赏。

注意:ES版本2.2.0
卡夫卡:0.9.0.0

编辑[已添加相关代码]

Going to execute new bulk composed of 10 actions
Executed bulk composed of 10 actions

转到ES的文档具有以下形式:

public Consumer(KafkaStream a_stream, int a_threadNumber, String esHost, String esCluster, int bulkSize, String topic) {

/*Create transport client*/
BulkProcessor bulkProcessor;

this.bulkProcessor = BulkProcessor.builder(client, new BulkProcessor.Listener() {
    public void beforeBulk(long executionId, BulkRequest request) {
            System.out.format("Going to execute new bulk composed of %d actions\n", request.numberOfActions());
    }

    public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
            System.out.format("Executed bulk composed of %d actions\n", response.getItems().length);
    }

    public void afterBulk(long executionId, BulkRequest request, Throwable failure) {
            System.out.format("Error executing bulk", failure);
    }
    }).setBulkActions(bulkSize) 
            .setBulkSize(new ByteSizeValue(200, ByteSizeUnit.MB)) 
            .setFlushInterval(TimeValue.timeValueSeconds(1))
            .build();
}

public void run() {     
    ConsumerIterator<byte[], byte[]> it = m_stream.iterator();   
    while (it.hasNext()) {
        byte[] x = it.next().message();
        try {           
            bulkProcessor.add(new IndexRequest(index, type, id.toString()).source(modifyMsg(x).toString()));
        } 
        catch (Exception e) {
            logger.warn("bulkProcessor failed: " + m_threadNumber + e.getMessage());
        }                   
    }
    logger.info("Shutting down Thread: " + m_threadNumber);
}

[编辑]

如果我在run()方法中添加以下行,则问题就消失了。

{"index":"temp1","type":"temp2","id":"0","event":"we're doomed"}
{"index":"temp1","type":"temp2","id":"1","event":"we're doomed"}
{"index":"temp1","type":"temp2","id":"2","event":"we're doomed"}
...
{"index":"temp1","type":"temp2","id":"9","event":"we're doomed"}

1 个答案:

答案 0 :(得分:0)

我觉得自己真是个傻瓜。在0 0 good 19 quality 1 0 1 smells 13 product 0 0 1 better€ 14 packaging 1 1 3 error 6 vendor 0 行中,方法bulkProcessor.add(new IndexRequest(index, type, id.toString()).source(modifyMsg(x).toString()));正在初始化modifyMsg()indextype,它在构造函数中设置为空字符串。这就是我的第一个索引请求因为索引名称无效而失败的原因。