我有一个巨大的文件,我需要阅读它并将其转储到DB中。 任何无效记录(无效长度,重复密钥等)(如果存在)都需要写入错误报告。由于文件的大小,我们尝试使用chunk-size(commit-interval)作为1000/5000/10000。在这个过程中,我发现由于使用了块而导致数据被冗余处理,因此我的错误报告不正确,它不仅具有来自输入文件的实际无效记录,而且还有来自块的重复记录。
代码段:
@Bean
public Step readAndWriteStudentInfo() {
return stepBuilderFactory.get("readAndWriteStudentInfo")
.<Student, Student>chunk(5000).reader(studentFileReader()).faultTolerant()
.skipPolicy(skipper)..listener(listener).processor(new ItemProcessor<Student, Student>() {
@Override
public Student process(Student Student) throws Exception {
if(processedRecords.contains(Student)){
return null;
}else {
processedRecords.add(Student);
return Student;
}
}
}).writer(studentDBWriter()).build();
}
@Bean
public ItemReader<Student> studentFileReader() {
FlatFileItemReader<Student> reader = new FlatFileItemReader<>();
reader.setResource(new FileSystemResource(studentInfoFileName));
reader.setLineMapper(new DefaultLineMapper<Student>() {
{
setLineTokenizer(new FixedLengthTokenizer() {
{
setNames(classProperties50);
setColumns(range50);
}
});
setFieldSetMapper(new BeanWrapperFieldSetMapper<Student>() {
{
setTargetType(Student.class);
}
});
}
});
reader.setSaveState(false);
reader.setLinesToSkip(1);
reader.setRecordSeparatorPolicy(new TrailerSkipper());
return reader;
}
@Bean
public ItemWriter<Student> studentDBWriter() {
JdbcBatchItemWriter<Student> writer = new JdbcBatchItemWriter<>();
writer.setSql(insertQuery);
writer.setDataSource(datSource);
writer.setItemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<Student>());
return writer;
}
我尝试过各种块大小,10,100,1000,5000。我的错误报告的准确性随着块大小的增加而恶化。写入错误报告是因为我实施的跳过政策,请告诉我是否需要该代码来帮助我。
如何确保我的作家在每个块中拾取唯一的记录集?
船长实施:
@Override
public boolean shouldSkip(Throwable t, int skipCount) throws SkipLimitExceededException {
String exception = t.getClass().getSimpleName();
if (t instanceof FileNotFoundException) {
return false;
}
switch (exception) {
case "FlatFileParseException":
FlatFileParseException ffpe = (FlatFileParseException) t;
String errorMessage = "Line no = " + ffpe.getLineNumber() + " " + ffpe.getMessage() + " Record is ["
+ ffpe.getInput() + "].\n";
writeToRecon(errorMessage);
return true;
case "SQLException":
SQLException sE = (SQLException) t;
String sqlErrorMessage = sE.getErrorCode() + " Record is [" + sE.getCause() + "].\n";
writeToRecon(sqlErrorMessage);
return true;
case "BatchUpdateException":
BatchUpdateException batchUpdateException = (BatchUpdateException) t;
String btchUpdtExceptionMsg = batchUpdateException.getMessage() + " " + batchUpdateException.getCause();
writeToRecon(btchUpdtExceptionMsg);
return true;
默认: 返回false; }