骆驼:项目数量小于批量大小时,如何拆分然后汇总

时间:2019-01-03 07:04:47

标签: kotlin apache-camel spring-camel

我有一条骆驼路线,该路线从S3读取文件并按如下方式处理输入文件:

  1. 使用Bindy将每一行解析为一个POJO(学生)
  2. 通过body()分割输出
  3. 根据正文(.semester)和批量大小为2
  4. 进行汇总
  5. 调用持久性服务以给定的批次上传到数据库

问题在于,批处理大小为2,记录数为奇数时,总会有一个记录没有保存。

提供的代码为Kotlin,但与等效的Java代码(与“ \ $ {simple expression}”前面的斜杠或缺少分号来终止语句)应该没有太大区别。

如果我将批处理大小设置为1,则将保存每条记录,否则最后一条记录将不会被保存。

我已经查看了message-processor的文档几次,但似乎无法涵盖这种特殊情况。

除了completionTimeout之外,我还设置了[completionInterval | completionSize],但这没什么区别。

以前有人遇到过这个问题吗?

val csvDataFormat = BindyCsvDataFormat(Student::class.java)

from("aws-s3://$student-12-bucket?amazonS3Client=#amazonS3&delay=5000")
    .log("A new Student input file has been received in S3: '\${header.CamelAwsS3BucketName}/\${header.CamelAwsS3Key}'")
    .to("direct:move-input-s3-object-to-in-progress")
    .to("direct:process-s3-file")
    .to("direct:move-input-s3-object-to-completed")
    .end()

from("direct:process-s3-file")
    .unmarshal(csvDataFormat)
    .split(body())
    .streaming()
    .parallelProcessing()
    .aggregate(simple("\${body.semester}"), GroupedBodyAggregationStrategy())
    .completionSize(2)
    .bean(persistenceService)
    .end()

对于包含七(7)条记录的输入CSV文件,这是生成的输出(带有一些添加的调试日志记录):

WARN 19540 --- [student-12-move] c.a.s.s.internal.S3AbortableInputStream  : Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
 INFO 19540 --- [student-12-move] student-workflow-main                    : A new Student input file has been received in S3: 'student-12-bucket/inbox/foo.csv'
 INFO 19540 --- [student-12-move] move-input-s3-object-to-in-progress      : Moving S3 file 'inbox/foo.csv' to 'in-progress' folder...
 INFO 19540 --- [student-12-move] student-workflow-main                    : Moved input S3 file 'in-progress/foo.csv' to 'in-progress' folder...
 INFO 19540 --- [student-12-move] pre-process-s3-file-records              : Start saving to database...
DEBUG 19540 --- [read #7 - Split] c.b.i.d.s.StudentPersistenceServiceImpl  : Saving record to database: Student(id=7, name=Student 7, semester=2nd, javaMarks=25)
DEBUG 19540 --- [read #7 - Split] c.b.i.d.s.StudentPersistenceServiceImpl  : Saving record to database: Student(id=5, name=Student 5, semester=2nd, javaMarks=81)
DEBUG 19540 --- [read #3 - Split] c.b.i.d.s.StudentPersistenceServiceImpl  : Saving record to database: Student(id=6, name=Student 6, semester=1st, javaMarks=15)
DEBUG 19540 --- [read #3 - Split] c.b.i.d.s.StudentPersistenceServiceImpl  : Saving record to database: Student(id=2, name=Student 2, semester=1st, javaMarks=62)
DEBUG 19540 --- [read #2 - Split] c.b.i.d.s.StudentPersistenceServiceImpl  : Saving record to database: Student(id=3, name=Student 3, semester=2nd, javaMarks=72)
DEBUG 19540 --- [read #2 - Split] c.b.i.d.s.StudentPersistenceServiceImpl  : Saving record to database: Student(id=1, name=Student 1, semester=2nd, javaMarks=87)
 INFO 19540 --- [student-12-move] device-group-workflow-main               : End pre-processing S3 CSV file records...
 INFO 19540 --- [student-12-move] move-input-s3-object-to-completed        : Moving S3 file 'in-progress/foo.csv' to 'completed' folder...
 INFO 19540 --- [student-12-move] device-group-workflow-main               : Moved S3 file 'in-progress/foo.csv' to 'completed' folder...

1 个答案:

答案 0 :(得分:0)

如果需要立即完成消息,则可以基于拆分器设置的交换属性来指定完成谓词。我没有尝试过,但是我认为

.completionPredicate( simple( "${exchangeProperty.CamelSplitComplete}" ) )

将处理最后一条消息。

我的另一个担心是您在拆分器中设置了parallelProcessing,这可能意味着消息未按顺序处理。是要应用并行处理的拆分器,还是聚合器?除了拆分记录,然后对它们进行处理,您似乎对拆分记录不做任何事情,因此最好将parallelProcessing指令移至聚合器。