Question

我正在尝试在纱线上运行Flink工作：

$ export HADOOP_CONF_DIR=/etc/hadoop/conf
$ ./${FLINK_HOME}/bin/yarn-session.sh ...
$ ./${FLINK_HOME}/bin/flink run \
    /home/clsadmin/messagehub-to-s3-1.0-SNAPSHOT.jar \
    --kafka-brokers ${KAFKA_BROKERS} \
    ...
     --output-folder s3://${S3_BUCKET}/${S3_FOLDER}

启动输出显示已添加到flink的hadoop类路径：

Using the result of 'hadoop classpath' to augment the Hadoop classpath: /usr/hdp/2.6.2.0-205/hadoop/conf:/usr/hdp/2.6.2.0-205/hadoop/lib/*:/usr/hdp/2.6.2.0-205/hadoop/.//*:/usr/hdp/2.6.2.0-205/hadoop-hdfs/./:/usr/hdp/2.6.2.0-205/hadoop-hdfs/lib/*:/usr/hdp/2.6.2.0-205/hadoop-hdfs/.//*:/usr/hdp/2.6.2.0-205/hadoop-yarn/lib/*:/usr/hdp/2.6.2.0-205/hadoop-yarn/.//*:/usr/hdp/2.6.2.0-205/hadoop-mapreduce/lib/*:/usr/hdp/2.6.2.0-205/hadoop-mapreduce/.//*::mysql-connector-java.jar:/home/common/lib/dataconnectorStocator/*:/usr/hdp/2.6.2.0-205/tez/*:/usr/hdp/2.6.2.0-205/tez/lib/*:/usr/hdp/2.6.2.0-205/tez/conf

然而，这项工作未能开始：

01/29/2018 16:11:04 Job execution switched to status RUNNING.
01/29/2018 16:11:04 Source: Custom Source -> Map -> Sink: Unnamed(1/1) switched to SCHEDULED
01/29/2018 16:11:04 Source: Custom Source -> Map -> Sink: Unnamed(1/1) switched to DEPLOYING
01/29/2018 16:11:05 Source: Custom Source -> Map -> Sink: Unnamed(1/1) switched to RUNNING
01/29/2018 16:11:05 Source: Custom Source -> Map -> Sink: Unnamed(1/1) switched to FAILED
java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.fs.s3a.S3AFileSystem
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2134)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2099)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2654)
    at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.createHadoopFileSystem(BucketingSink.java:1196)
    at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initFileSystem(BucketingSink.java:411)
    at org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink.initializeState(BucketingSink.java:355)
    at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
    at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
    at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
    at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:259)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeOperators(StreamTask.java:694)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:682)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253)
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
    at java.lang.Thread.run(Thread.java:745)

但是，如果我查看Flink报告的类路径：

$ grep org.apache.hadoop.fs.s3a.S3AFileSystem /usr/hdp/2.6.2.0-205/hadoop/.//*

我可以看到hadoop-aws.jar在类路径上：

...
Binary file /usr/hdp/2.6.2.0-205/hadoop/.//hadoop-aws-2.7.3.2.6.2.0-205.jar matches
Binary file /usr/hdp/2.6.2.0-205/hadoop/.//hadoop-aws.jar matches
...

进一步调查我可以看到jar中存在类：

jar -tf /usr/hdp/2.6.2.0-205/hadoop/hadoop-aws.jar | grep org.apache.hadoop.fs.s3a.S3AFileSystem

返回

org/apache/hadoop/fs/s3a/S3AFileSystem$1.class
org/apache/hadoop/fs/s3a/S3AFileSystem$2.class
org/apache/hadoop/fs/s3a/S3AFileSystem$3.class
org/apache/hadoop/fs/s3a/S3AFileSystem$WriteOperationHelper.class
org/apache/hadoop/fs/s3a/S3AFileSystem$4.class
org/apache/hadoop/fs/s3a/S3AFileSystem.class

如果我提交示例应用：

./flink-1.4.0/bin/flink run \
   ./flink-1.4.0/examples/batch/WordCount.jar \
   --input "s3://${S3_BUCKET}/LICENSE-2.0.txt"  \
   --output "s3://${S3_BUCKET}/license-word-count.txt"

这没有问题。

我的jar文件不包含任何hadoop类：

$ jar -tf /home/clsadmin/messagehub-to-s3-1.0-SNAPSHOT.jar | grep hadoop
<<not found>>

有没有人知道为什么Flink示例运行正常，但Flink在使用我的代码加载S3AFileSystem时遇到问题？

更新：

我通过以下方式解决了问题：

rm -f flink-1.4.0/lib/flink-shaded-hadoop2-uber-1.4.0.jar

这是一个安全的解决方案吗？

flink on yarn - 无法初始化类org.apache.hadoop.fs.s3a.S3AFileSystem

0 个答案: