Question

嗨，我是spring批处理的新手，我想为每个处理的块创建多个文件（csv）。 FileName将类似于timestamp.csv。知道我该怎么做吗？基本上，它是将一个大文件拆分为多个小文件。

谢谢！

Answer 1

CSV文件基本上是文本文件，最后带有换行符。

就将大CSV文件拆分为较小文件而言，您只需在Java中逐行读取大文件，并且当读取的行数达到阈值计数/最大计数小文件（10、100、1000等），您可以根据需要创建一个具有命名约定的新文件，并将数据转储到那里。

How to read a large text file line by line using Java?

BufferedReader是逐行读取文本文件的主要类。

实现此逻辑与Spring Batch无关，但可以使用Java或使用OS级命令。

因此，您有两个截然不同的逻辑部分，逐行读取大文件并创建csv ...您可以将这两部分开发为单独的组件，然后根据您的业务需求将其插入Spring Batch Framework中。

有一个Java库可以轻松处理CSV文件，您可能想使用它-取决于所涉及的复杂性。

<dependency>
        <groupId>com.opencsv</groupId>
        <artifactId>opencsv</artifactId>
        <version>4.6</version>
</dependency>

Answer 2

在春季批处理中使用MClass(FILE *t_file) : MClass(MNameSpace::readFile(t_file)) {}获取实施细节，请检查

并查看API文档here

Answer 3

我会使用split命令（或等效命令）之类的命令行实用程序，或者尝试使用纯Java语言（请参阅Java - Read file and split into multiple files）。

但是，如果您真的想用Spring Batch做到这一点，那么可以使用类似的东西：

import java.time.LocalDateTime;

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.batch.item.ExecutionContext;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.FlatFileItemWriter;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.batch.item.file.mapping.PassThroughLineMapper;
import org.springframework.batch.item.file.transform.PassThroughLineAggregator;
import org.springframework.context.ApplicationContext;
import org.springframework.context.annotation.AnnotationConfigApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.FileSystemResource;

@Configuration
@EnableBatchProcessing
public class MyJob {

    private final JobBuilderFactory jobBuilderFactory;

    private final StepBuilderFactory stepBuilderFactory;

    public MyJob(JobBuilderFactory jobBuilderFactory, StepBuilderFactory stepBuilderFactory) {
        this.jobBuilderFactory = jobBuilderFactory;
        this.stepBuilderFactory = stepBuilderFactory;
    }

    @Bean
    public FlatFileItemReader<String> itemReader() {
        return new FlatFileItemReaderBuilder<String>()
                .name("flatFileReader")
                .resource(new FileSystemResource("foos.txt"))
                .lineMapper(new PassThroughLineMapper())
                .build();
    }

    @Bean
    public ItemWriter<String> itemWriter() {
        final FlatFileItemWriter<String> writer = new FlatFileItemWriter<>();
        writer.setLineAggregator(new PassThroughLineAggregator<>());
        writer.setName("chunkFileItemWriter");
        return items -> {
            writer.setResource(new FileSystemResource("foos" + getTimestamp() + ".txt"));
            writer.open(new ExecutionContext());
            writer.write(items);
            writer.close();
        };
    }

    private String getTimestamp() {
        // TODO tested on unix/linux systems, update as needed to not contain illegal characters for a file name on MS windows
        return LocalDateTime.now().toString();
    }

    @Bean
    public Step step() {
        return stepBuilderFactory.get("step")
                .<String, String>chunk(3)
                .reader(itemReader())
                .writer(itemWriter())
                .build();
    }

    @Bean
    public Job job() {
        return jobBuilderFactory.get("job")
                .start(step())
                .build();
    }

    public static void main(String[] args) throws Exception {
        ApplicationContext context = new AnnotationConfigApplicationContext(MyJob.class);
        JobLauncher jobLauncher = context.getBean(JobLauncher.class);
        Job job = context.getBean(Job.class);
        jobLauncher.run(job, new JobParameters());
    }

}

文件foos.txt如下：

foo1
foo2
foo3
foo4
foo5
foo6

该示例将每个块写入带有时间戳的单独文件中：

文件1 foos2019-11-28T09:23:47.769.txt：

foo1
foo2
foo3

文件2 foos2019-11-28T09:23:47.779.txt：

foo4
foo5
foo6

我认为使用数字代替时间戳BTW更好。

注意：对于这种用例，我不太在乎可重启性。

如何为每个块创建多个文件（csv）？

3 个答案: