Flink StreamingFileSink - ParquetAvroWriters

时间:2021-05-03 13:27:12

标签: apache-flink parquet flink-streaming

我正在使用 Flink - 流式文件接收器来写入传入数据 S3 存储桶。我的代码与 forRowFormat 选项完美配合。 现在我正在尝试设置 forBulkFormat 选项以将镶木地板格式的数据写入 S3。我的sink函数如下。

private static SinkFunction<Pojo> getS3Sink() {

   final StreamingFileSink<Pojo> sink = StreamingFileSink
        .forBulkFormat(new Path(s3SinkPath),
                ParquetAvroWriters.forSpecificRecord(Pojo.class)
        )

        .withBucketAssigner(new CustomBucketAssigner())
        .build();
   return sink;
}

我正在 IntelliJ 上运行整个设置。 当我运行此代码时,出现以下错误:

java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/lib/output/FileOutputFormat
at java.lang.ClassLoader.defineClass1(Native Method) ~[?:?]
at java.lang.ClassLoader.defineClass(ClassLoader.java:1016) ~[?:?]
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621) ~[?:?]
at jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579) ~[?:?]
at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178) ~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:521) ~[?:?]
at org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:285) ~[parquet-hadoop-1.12.0.jar:1.12.0]
at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:641) ~[parquet-hadoop-1.12.0.jar:1.12.0]
at org.apache.flink.formats.parquet.avro.ParquetAvroWriters.createAvroParquetWriter(ParquetAvroWriters.java:87) ~[flink-parquet_2.12-1.11.2.jar:1.11.2]
at org.apache.flink.formats.parquet.avro.ParquetAvroWriters.lambda$forSpecificRecord$824091b3$1(ParquetAvroWriters.java:49) ~[flink-parquet_2.12-1.11.2.jar:1.11.2]
at org.apache.flink.formats.parquet.ParquetWriterFactory.create(ParquetWriterFactory.java:57) ~[flink-parquet_2.12-1.11.2.jar:1.11.2]
at org.apache.flink.streaming.api.functions.sink.filesystem.BulkBucketWriter.openNew(BulkBucketWriter.java:69) ~[flink-streaming-java_2.12-1.11.2.jar:1.11.2]
at org.apache.flink.streaming.api.functions.sink.filesystem.OutputStreamBasedPartFileWriter$OutputStreamBasedBucketWriter.openNewInProgressFile(OutputStreamBasedPartFileWriter.java:83) ~[flink-streaming-java_2.12-1.11.2.jar:1.11.2]

Flink 文档没有讨论处理输出格式所需的额外配置。你能帮忙吗?

以下是maven依赖:

    <flink.version>1.11.2</flink.version>
    <scala.binary.version>2.12</scala.binary.version>
    <avro.version>1.10.2</avro.version>
    <flink.format.parquet.version>1.12.0</flink.format.parquet.version>

     <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-avro</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.parquet</groupId>
        <artifactId>parquet-avro</artifactId>
        <version>${flink.format.parquet.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-parquet_${scala.binary.version}</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-s3-fs-hadoop</artifactId>
        <version>${flink.version}</version>
    </dependency>

谢谢。

0 个答案:

没有答案