我有一个简单的csv文件,大约有400,000行(仅一列) 我需要很多时间来阅读记录并处理它们
处理器根据couchbase验证记录
作者 - 写入远程主题 花了我大约30分钟。多数民众赞成在疯狂。
我读到flatfileItemreader不是线程安全的。所以我的块值是1.
我读过异步处理可以提供帮助。但我看不出任何改进。
这是我的代码:
@Configuration
@EnableBatchProcessing
public class NotificationFileProcessUploadedFileJob {
@Value("${expected.snid.header}")
public String snidHeader;
@Value("${num.of.processing.chunks.per.file}")
public int numOfProcessingChunksPerFile;
@Autowired
private InfrastructureConfigurationConfig infrastructureConfigurationConfig;
private static final String OVERRIDDEN_BY_EXPRESSION = null;
@Inject
private JobBuilderFactory jobs;
@Inject
private StepBuilderFactory stepBuilderFactory;
@Inject
ExecutionContextPromotionListener executionContextPromotionListener;
@Bean
public Job processUploadedFileJob() throws Exception {
return this.jobs.get("processUploadedFileJob").start((processSnidUploadedFileStep())).build();
}
@Bean
public Step processSnidUploadedFileStep() {
return stepBuilderFactory.get("processSnidFileStep")
.<PushItemDTO, PushItemDTO>chunk(numOfProcessingChunksPerFile)
.reader(snidFileReader(OVERRIDDEN_BY_EXPRESSION))
.processor(asyncItemProcessor())
.writer(asyncItemWriter())
// .throttleLimit(20)
// .taskJobExecutor(infrastructureConfigurationConfig.taskJobExecutor())
// .faultTolerant()
// .skipLimit(10) //default is set to 0
// .skip(MySQLIntegrityConstraintViolationException.class)
.build();
}
@Inject
ItemWriter writer;
@Bean
public AsyncItemWriter asyncItemWriter() {
AsyncItemWriter asyncItemWriter=new AsyncItemWriter();
asyncItemWriter.setDelegate(writer);
return asyncItemWriter;
}
@Bean
@Scope(value = "step", proxyMode = ScopedProxyMode.INTERFACES)
public ItemStreamReader<PushItemDTO> snidFileReader(@Value("#{jobParameters[filePath]}") String filePath) {
FlatFileItemReader<PushItemDTO> itemReader = new FlatFileItemReader<PushItemDTO>();
itemReader.setLineMapper(snidLineMapper());
itemReader.setLinesToSkip(1);
itemReader.setResource(new FileSystemResource(filePath));
return itemReader;
}
@Bean
public AsyncItemProcessor asyncItemProcessor() {
AsyncItemProcessor<PushItemDTO, PushItemDTO> asyncItemProcessor = new AsyncItemProcessor();
asyncItemProcessor.setDelegate(processor(OVERRIDDEN_BY_EXPRESSION, OVERRIDDEN_BY_EXPRESSION, OVERRIDDEN_BY_EXPRESSION,
OVERRIDDEN_BY_EXPRESSION, OVERRIDDEN_BY_EXPRESSION, OVERRIDDEN_BY_EXPRESSION, OVERRIDDEN_BY_EXPRESSION));
asyncItemProcessor.setTaskExecutor(infrastructureConfigurationConfig.taskProcessingExecutor());
return asyncItemProcessor;
}
@Scope(value = "step", proxyMode = ScopedProxyMode.INTERFACES)
@Bean
public ItemProcessor<PushItemDTO, PushItemDTO> processor(@Value("#{jobParameters[pushMessage]}") String pushMessage,
@Value("#{jobParameters[jobId]}") String jobId,
@Value("#{jobParameters[taskId]}") String taskId,
@Value("#{jobParameters[refId]}") String refId,
@Value("#{jobParameters[url]}") String url,
@Value("#{jobParameters[targetType]}") String targetType,
@Value("#{jobParameters[gameType]}") String gameType) {
return new PushItemProcessor(pushMessage, jobId, taskId, refId, url, targetType, gameType);
}
@Bean
public LineMapper<PushItemDTO> snidLineMapper() {
DefaultLineMapper<PushItemDTO> lineMapper = new DefaultLineMapper<PushItemDTO>();
DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer();
lineTokenizer.setDelimiter(",");
lineTokenizer.setStrict(true);
lineTokenizer.setStrict(true);
String[] splittedHeader = snidHeader.split(",");
lineTokenizer.setNames(splittedHeader);
BeanWrapperFieldSetMapper<PushItemDTO> fieldSetMapper = new BeanWrapperFieldSetMapper<PushItemDTO>();
fieldSetMapper.setTargetType(PushItemDTO.class);
lineMapper.setLineTokenizer(lineTokenizer);
lineMapper.setFieldSetMapper(new PushItemFieldSetMapper());
return lineMapper;
}
}
@Bean
@Override
public SimpleAsyncTaskExecutor taskProcessingExecutor() {
SimpleAsyncTaskExecutor simpleAsyncTaskExecutor = new SimpleAsyncTaskExecutor();
simpleAsyncTaskExecutor.setConcurrencyLimit(300);
return simpleAsyncTaskExecutor;
}
您认为我如何提高处理性能并使其更快? 谢谢
ItemWriter代码:
@Bean
public ItemWriter writer() {
return new KafkaWriter();
}
public class KafkaWriter implements ItemWriter<PushItemDTO> {
private static final Logger logger = LoggerFactory.getLogger(KafkaWriter.class);
@Autowired
KafkaProducer kafkaProducer;
@Override
public void write(List<? extends PushItemDTO> items) throws Exception {
for (PushItemDTO item : items) {
try {
logger.debug("Writing to kafka=" + item);
sendMessageToKafka(item);
} catch (Exception e) {
logger.error("Error writing item=" + item.toString(), e);
}
}
}
答案 0 :(得分:0)
增加提交次数是我开始的地方。请记住提交计数的含义。由于您将其设置为1,因此您对每个项目执行以下 :
您的配置无法显示委托ItemWriter
是什么,所以我无法告诉您,但至少您要执行多个SQL语句每个项目来更新工作存储库。
你是正确的,因为FlatFileItemReader
不是线程安全的。但是,您没有使用多个线程进行读取,只进行处理,因此没有理由将提交计数设置为1。