Question

我有一条骆驼路线，该路线从S3读取文件并按如下方式处理输入文件：

使用Bindy将每一行解析为一个POJO（学生）
通过body（）分割输出
根据正文（.semester）和批量大小为2
调用持久性服务以给定的批次上传到数据库

问题在于，批处理大小为2，记录数为奇数时，总会有一个记录没有保存。

提供的代码为Kotlin，但与等效的Java代码（与“ \ $ {simple expression}”前面的斜杠或缺少分号来终止语句）应该没有太大区别。

如果我将批处理大小设置为1，则将保存每条记录，否则最后一条记录将不会被保存。

我已经查看了message-processor的文档几次，但似乎无法涵盖这种特殊情况。

除了completionTimeout之外，我还设置了[completionInterval | completionSize]，但这没什么区别。

以前有人遇到过这个问题吗？

val csvDataFormat = BindyCsvDataFormat(Student::class.java)

from("aws-s3://$student-12-bucket?amazonS3Client=#amazonS3&delay=5000")
    .log("A new Student input file has been received in S3: '\${header.CamelAwsS3BucketName}/\${header.CamelAwsS3Key}'")
    .to("direct:move-input-s3-object-to-in-progress")
    .to("direct:process-s3-file")
    .to("direct:move-input-s3-object-to-completed")
    .end()

from("direct:process-s3-file")
    .unmarshal(csvDataFormat)
    .split(body())
    .streaming()
    .parallelProcessing()
    .aggregate(simple("\${body.semester}"), GroupedBodyAggregationStrategy())
    .completionSize(2)
    .bean(persistenceService)
    .end()

对于包含七（7）条记录的输入CSV文件，这是生成的输出（带有一些添加的调试日志记录）：

WARN 19540 --- [student-12-move] c.a.s.s.internal.S3AbortableInputStream  : Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
 INFO 19540 --- [student-12-move] student-workflow-main                    : A new Student input file has been received in S3: 'student-12-bucket/inbox/foo.csv'
 INFO 19540 --- [student-12-move] move-input-s3-object-to-in-progress      : Moving S3 file 'inbox/foo.csv' to 'in-progress' folder...
 INFO 19540 --- [student-12-move] student-workflow-main                    : Moved input S3 file 'in-progress/foo.csv' to 'in-progress' folder...
 INFO 19540 --- [student-12-move] pre-process-s3-file-records              : Start saving to database...
DEBUG 19540 --- [read #7 - Split] c.b.i.d.s.StudentPersistenceServiceImpl  : Saving record to database: Student(id=7, name=Student 7, semester=2nd, javaMarks=25)
DEBUG 19540 --- [read #7 - Split] c.b.i.d.s.StudentPersistenceServiceImpl  : Saving record to database: Student(id=5, name=Student 5, semester=2nd, javaMarks=81)
DEBUG 19540 --- [read #3 - Split] c.b.i.d.s.StudentPersistenceServiceImpl  : Saving record to database: Student(id=6, name=Student 6, semester=1st, javaMarks=15)
DEBUG 19540 --- [read #3 - Split] c.b.i.d.s.StudentPersistenceServiceImpl  : Saving record to database: Student(id=2, name=Student 2, semester=1st, javaMarks=62)
DEBUG 19540 --- [read #2 - Split] c.b.i.d.s.StudentPersistenceServiceImpl  : Saving record to database: Student(id=3, name=Student 3, semester=2nd, javaMarks=72)
DEBUG 19540 --- [read #2 - Split] c.b.i.d.s.StudentPersistenceServiceImpl  : Saving record to database: Student(id=1, name=Student 1, semester=2nd, javaMarks=87)
 INFO 19540 --- [student-12-move] device-group-workflow-main               : End pre-processing S3 CSV file records...
 INFO 19540 --- [student-12-move] move-input-s3-object-to-completed        : Moving S3 file 'in-progress/foo.csv' to 'completed' folder...
 INFO 19540 --- [student-12-move] device-group-workflow-main               : Moved S3 file 'in-progress/foo.csv' to 'completed' folder...

Answer 1

如果需要立即完成消息，则可以基于拆分器设置的交换属性来指定完成谓词。我没有尝试过，但是我认为

.completionPredicate( simple( "${exchangeProperty.CamelSplitComplete}" ) )

将处理最后一条消息。

我的另一个担心是您在拆分器中设置了parallelProcessing，这可能意味着消息未按顺序处理。是要应用并行处理的拆分器，还是聚合器？除了拆分记录，然后对它们进行处理，您似乎对拆分记录不做任何事情，因此最好将parallelProcessing指令移至聚合器。

骆驼：项目数量小于批量大小时，如何拆分然后汇总

1 个答案: