使用Apache Nifi将批量记录从Db2插入到Cassandra

时间:2019-04-02 03:41:58

标签: cassandra apache-nifi

我正在使用Apache Nifi将DB2加载数据连接到Cassandra。我的Db2表有200多个k记录,而在cassandra目标中仅插入了400个no。其余错误(如cassandra批处理语句)无法插入超过65535条记录。尝试对yaml文件进行一些更改,但没有帮助,并得到相同的错误。我可以在Nifi中从批量更改为批量吗?或者,如果我需要从Db2加载所有记录,则需要在Nifi或Cassandra中进行哪些更改? enter image description here enter image description here 日志文件o / p

2019-04-02 13:50:26,786 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@623cd12e checkpointed with 28 Records and 0 Swap Files in 53 milliseconds (Stop-the-world time = 18 milliseconds, Clear Edit Logs time = 28 millis), max Transaction ID 83
2019-04-02 13:50:30,590 ERROR [Timer-Driven Process Thread-10] o.a.n.p.cassandra.PutCassandraRecord PutCassandraRecord[id=993740ce-0169-1000-7471-e9ff7f0272f6] Unable to write the records into Cassandra table due to java.lang.IllegalStateException: Batch statement cannot contain more than 65535 statements.: java.lang.IllegalStateException: Batch statement cannot contain more than 65535 statements.
java.lang.IllegalStateException: Batch statement cannot contain more than 65535 statements.
    at com.datastax.driver.core.BatchStatement.add(BatchStatement.java:154)
    at org.apache.nifi.processors.cassandra.PutCassandraRecord.onTrigger(PutCassandraRecord.java:165)
    at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
    at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1162)
    at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:205)
    at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

2 个答案:

答案 0 :(得分:0)

这是一个已知问题,已在1.10.0中修复。请考虑升级apache-nifi

https://issues.apache.org/jira/browse/NIFI-6016

答案 1 :(得分:-1)

这是批处理作业吗?如果可以,则可以使用Cassandra ssloader,请参见https://docs.datastax.com/en/cassandra/3.0/cassandra/tools/toolsBulkloader.html

您必须创建SSTables,然后使用ssloader将它们流式传输到您的Cassandra集群。 20万条记录应该没问题。