我有一条骆驼路线,该路线从S3读取文件并按如下方式处理输入文件:
.semester
)和批量大小为2 问题在于,批处理大小为2,记录数为奇数时,总会有一个记录没有保存。
提供的代码为Kotlin,但与等效的Java代码(与“ \ $ {simple expression}”前面的斜杠或缺少分号来终止语句)应该没有太大区别。
如果我将批处理大小设置为1,则将保存每条记录,否则最后一条记录将不会被保存。
我已经查看了message-processor的文档几次,但似乎无法涵盖这种特殊情况。
除了completionTimeout
之外,我还设置了[completionInterval
| completionSize
],但这没什么区别。
以前有人遇到过这个问题吗?
val csvDataFormat = BindyCsvDataFormat(Student::class.java)
from("aws-s3://$student-12-bucket?amazonS3Client=#amazonS3&delay=5000")
.log("A new Student input file has been received in S3: '\${header.CamelAwsS3BucketName}/\${header.CamelAwsS3Key}'")
.to("direct:move-input-s3-object-to-in-progress")
.to("direct:process-s3-file")
.to("direct:move-input-s3-object-to-completed")
.end()
from("direct:process-s3-file")
.unmarshal(csvDataFormat)
.split(body())
.streaming()
.parallelProcessing()
.aggregate(simple("\${body.semester}"), GroupedBodyAggregationStrategy())
.completionSize(2)
.bean(persistenceService)
.end()
对于包含七(7)条记录的输入CSV文件,这是生成的输出(带有一些添加的调试日志记录):
WARN 19540 --- [student-12-move] c.a.s.s.internal.S3AbortableInputStream : Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use. INFO 19540 --- [student-12-move] student-workflow-main : A new Student input file has been received in S3: 'student-12-bucket/inbox/foo.csv' INFO 19540 --- [student-12-move] move-input-s3-object-to-in-progress : Moving S3 file 'inbox/foo.csv' to 'in-progress' folder... INFO 19540 --- [student-12-move] student-workflow-main : Moved input S3 file 'in-progress/foo.csv' to 'in-progress' folder... INFO 19540 --- [student-12-move] pre-process-s3-file-records : Start saving to database... DEBUG 19540 --- [read #7 - Split] c.b.i.d.s.StudentPersistenceServiceImpl : Saving record to database: Student(id=7, name=Student 7, semester=2nd, javaMarks=25) DEBUG 19540 --- [read #7 - Split] c.b.i.d.s.StudentPersistenceServiceImpl : Saving record to database: Student(id=5, name=Student 5, semester=2nd, javaMarks=81) DEBUG 19540 --- [read #3 - Split] c.b.i.d.s.StudentPersistenceServiceImpl : Saving record to database: Student(id=6, name=Student 6, semester=1st, javaMarks=15) DEBUG 19540 --- [read #3 - Split] c.b.i.d.s.StudentPersistenceServiceImpl : Saving record to database: Student(id=2, name=Student 2, semester=1st, javaMarks=62) DEBUG 19540 --- [read #2 - Split] c.b.i.d.s.StudentPersistenceServiceImpl : Saving record to database: Student(id=3, name=Student 3, semester=2nd, javaMarks=72) DEBUG 19540 --- [read #2 - Split] c.b.i.d.s.StudentPersistenceServiceImpl : Saving record to database: Student(id=1, name=Student 1, semester=2nd, javaMarks=87) INFO 19540 --- [student-12-move] device-group-workflow-main : End pre-processing S3 CSV file records... INFO 19540 --- [student-12-move] move-input-s3-object-to-completed : Moving S3 file 'in-progress/foo.csv' to 'completed' folder... INFO 19540 --- [student-12-move] device-group-workflow-main : Moved S3 file 'in-progress/foo.csv' to 'completed' folder...
答案 0 :(得分:0)
如果需要立即完成消息,则可以基于拆分器设置的交换属性来指定完成谓词。我没有尝试过,但是我认为
.completionPredicate( simple( "${exchangeProperty.CamelSplitComplete}" ) )
将处理最后一条消息。
我的另一个担心是您在拆分器中设置了parallelProcessing
,这可能意味着消息未按顺序处理。是要应用并行处理的拆分器,还是聚合器?除了拆分记录,然后对它们进行处理,您似乎对拆分记录不做任何事情,因此最好将parallelProcessing
指令移至聚合器。