Question

我有一个批处理工作，将大约300,000行写入cassandra。我将它们分成小批，每批大小为50行。

伪代码在下面。

@Override
public void executeQuery(List<BatchStatement> batches) {
    List<ResultSetFuture> futures = List.of();
    for (BatchStatement batch: batches) {
        futures.add(session.executeAsync(batch));
    }

    for(ResultSetFuture rsf: futures) {
        rsf.getUninterruptibly();
        /* I have to add the following code to avoid WriteTimeoutException
        try {
            Thread.sleep(100);
        } catch (InterruptedException e) {
            logger.error("Thread.sleep", e);
        }
        */

    }
}

我不知道为什么没有Thread.sleep，它总是会给出WriteTimeout异常。如何避免这种情况？

Answer 1

通过对数据使用批处理语句（很可能属于不同的分区），您确实使系统超载，因为协调节点需要将请求发送到其他节点并等待答案。您仅需要将批处理用于特定的用例，而不必像在关系数据库中使用批处理一样，以加快执行速度。 documentation描述了批处理的错误使用。

为每行发送单个异步请求将改善情况，但是您需要注意不要同时发送太多请求（使用信号灯），并且不要增加每次发送中的进行中请求的数量通过pooling options连接。

没有Thread.sleep的Cassandra抛出WriteTimeout异常

1 个答案: