Question

我正在尝试使用flink将csv文件写为镶木地板。我使用以下代码并得到错误。

val parquetFormat = new HadoopOutputFormat[Void, String](new AvroParquetOutputFormat, job)
FileOutputFormat.setOutputPath(job, new Path(outputPath))

我收到以下构建错误。有人可以帮忙吗？

类型不匹配;发现：parquet.avro.AvroParquetOutputFormat 需要： org.apache.hadoop.mapreduce.OutputFormat [Void，String] ingestion.scala / flink-scala / src / main / scala / com / sc / edl / flink line 75斯卡拉问题

Answer 1

您想创建一个需要cache: signed entry "org/apache/derby/jdbc/" missing from jar .../derbyclient.jar的{{1}}。

您提供的HadoopOutputFormat[Void, String]扩展了OutputFormat[Void, String]。 AvroParquetOutputFormat定义为ParquetOutputFormat<IndexedRecord>。

因此，您提供ParquetOutputFormat，ParquetOutputFormat<T> extends FileOutputFormat<Void, T>期望OutputFormat[Void, IndexedRecord]。

您应该将HadoopOutputFormat[Void, String]更改为

OutputFormat[Void, String]

如果您要撰写的parquetFormat不属于val parquetFormat = new HadoopOutputFormat[Void, IndexedRecord]( new AvroParquetOutputFormat, job) FileOutputFormat.setOutputPath(job, new Path(outputPath))类型，则应添加DataSet，将您的数据转换为(Void, IndexedRecord)对。

Answer 2

此问题仍然存在，因为Flink Tuple目前不支持NULL Keys。将发生以下错误： Caused by: org.apache.flink.types.NullFieldException: Field 1 is null, but expected to hold a value.

更好的选择是使用KiteSDK，如本例所述： https://github.com/nezihyigitbasi/FlinkParquet 因此，如果您需要动态架构，那么这种方法将无法工作，因为您需要严格遵守架构。此外，这更适合阅读而不是写作。

Spark DataFrame与Parquet的合作非常好，不仅在API方面，而且在性能方面。但是如果有人想要使用Flink，那么你需要等待flink社区发布api或编辑自己的镶木地板代码，这可能是一个很大的努力。

仅实现了这些连接器 https://github.com/apache/flink/tree/master/flink-connectors 所以，我个人的建议是，如果你可以使用spark，那就去吧，考虑到生产用例，它有更成熟的api。当你坚持使用flink的基本需求时，你可能也会卡在其他地方。

到目前为止，不要浪费时间去寻找Flink，我浪费了很多关键时间，而不是使用Hive，Spark或MR等标准选项。

Answer 3

要扩展另一个答案，您可以通过拖放到Java来实例化所需的Void类型：

// in src/main/java/com/yourOrg/FlinkUtils.java
public class FlinkUtils {
    /* Stupid hack because we can't instantiate Void in Scala */
    public static Void getVoid() {
        return null;
    }
}

// src/main/scala/com/yourOrg/FlinkJob.scala

val voidKeyedDataset = ds.map((FlinkUtils.getVoid, _))
voidKeyedDataset.output(...)

Flink转换为镶木地板错误

3 个答案: