Question

目前，我正在尝试使用带有CSV的Spring Batch进行批处理来进行Parquet数据处理，有什么方法可以使用Java API来将数据附加到Parquet文件中，因为批处理是一种迭代方法。

我尝试了合并文件，但是我不确定这是否正确。

@Override 公共无效write（列出员工）引发异常{

    if(!employees.isEmpty()) {
        Path dataFile=new Path(destinationFilePath+"-"+String.valueOf(i++)+".parquet");
        ParquetWriter<Employee> writer = AvroParquetWriter.<Employee>builder(dataFile)
                .withSchema(ReflectData.AllowNull.get().getSchema(Employee.class))
                .withDataModel(ReflectData.get())
                .withConf(new Configuration())
                //.withCompressionCodec(CompressionCodecName.SNAPPY)
                .withWriteMode(ParquetFileWriter.Mode.CREATE)
                .build();
        for (Employee employee : employees) {
            writer.write(employee);
        }
        writer.close();

这是现有代码，其中ParquetFileWriter.Mode.CREATE和ParquetFileWriter.Mode.OVERWRITE是唯一可用的选项，但spark支持附加操作。

有什么方法可以使用Java API将数据附加到具有相同架构的现有Parquet文件中

0 个答案: