Question

我已向从hdfs路径读取序列文件的项目中添加了Flink Hadoop兼容性

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-hadoop-compatibility_2.11</artifactId>
    <version>1.5.6</version>
</dependency>

这是Java代码段

DataSource<Tuple2<NullWritable, BytesWritable>> input = env.createInput(HadoopInputs.readHadoopFile(
    new org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<NullWritable, BytesWritable>(),
    NullWritable.class, BytesWritable.class, path));

当我在Eclipse中运行它时，效果很好，但是当我通过命令行“ flink run ...”提交它时，它会抱怨，

The type returned by the input format could not be automatically determined. Please specify the TypeInformation of the produced type explicitly by using the 'createInput(InputFormat, TypeInformation)' method instead.

确定，所以我更新代码以添加类型信息，

DataSource<Tuple2<NullWritable, BytesWritable>> input = env.createInput(HadoopInputs.readHadoopFile(
    new org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat<NullWritable, BytesWritable>(),
    NullWritable.class, BytesWritable.class, path),
    TypeInformation.of(new TypeHint<Tuple2<NullWritable, BytesWritable>>() {}));

现在它抱怨了，

Caused by: java.lang.RuntimeException: Could not load the TypeInformation for the class 'org.apache.hadoop.io.Writable'. You may be missing the 'flink-hadoop-compatibility' dependency.

有人建议将flink-hadoop-compatibility_2.11-1.5.6.jar复制到FLINK_HOME / lib，但这无济于事，仍然是同样的错误。

有人有任何线索吗？

我的Flink是独立安装的版本1.5.6。

更新：

对不起，我将flink-hadoop-compatibility_2.11-1.5.6.jar复制到错误的位置，修复该错误后可以正常工作。

现在我的问题是，还有其他方法吗？因为将jar文件复制到FLINK_HOME / lib对我绝对不是一个好主意，尤其是在谈论大型flink集群时。

Answer 1

已修复1.9.0版，有关详细信息，请参见https://issues.apache.org/jira/browse/FLINK-12163

Flink，使用Hadoop兼容性时的异常行为

1 个答案: