引起:java.lang.ClassCastException:org.apache.hadoop.fs.s3a.S3AFileSystem无法强制转换为org.apache.hadoop.fs.FileSystem

时间:2018-01-24 10:27:19

标签: apache-flink

我可以毫无问题地从IntelliJ运行我的Flink作业,但是当我尝试在flink独立运行它时...

wget ... flink-1.4.0-bin-hadoop27-scala_2.11.tgz
tar xf flink-1.4.0-bin-hadoop27-scala_2.11.tgz
./flink-1.4.0/bin/start-local.sh
./flink-1.4.0/bin/flink run ../mypath/target/messagehub-to-s3-1.0-SNAPSHOT.jar ...

我收到错误消息:

Caused by: java.lang.ClassCastException: org.apache.hadoop.fs.s3a.S3AFileSystem cannot be cast to org.apache.hadoop.fs.FileSystem
    at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:112)

我使用maven shade插件(基于dataArtisans example)来创建我的jar文件:

        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>2.4.1</version>
            <executions>
                <!-- Run shade goal on package phase -->
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <artifactSet>
                            <excludes>
                                <!-- This list contains all dependencies of flink-dist
                                Everything else will be packaged into the fat-jar
                                -->
                                <exclude>org.apache.flink:flink-annotations</exclude>
                                <exclude>org.apache.flink:flink-shaded-curator-recipes</exclude>
                                <exclude>org.apache.flink:flink-core</exclude>
                                <exclude>org.apache.flink:flink-java</exclude>
                                <exclude>org.apache.flink:flink-scala_2.11</exclude>
                                <exclude>org.apache.flink:flink-runtime_2.11</exclude>
                                <exclude>org.apache.flink:flink-optimizer_2.11</exclude>
                                <exclude>org.apache.flink:flink-clients_2.11</exclude>
                                <exclude>org.apache.flink:flink-avro_2.11</exclude>
                                <exclude>org.apache.flink:flink-examples-batch_2.11</exclude>
                                <exclude>org.apache.flink:flink-examples-streaming_2.11</exclude>
                                <exclude>org.apache.flink:flink-streaming-java_2.11</exclude>
                                <exclude>org.apache.flink:flink-streaming-scala_2.11</exclude>
                                <exclude>org.apache.flink:flink-scala-shell_2.11</exclude>
                                <exclude>org.apache.flink:flink-python</exclude>
                                <exclude>org.apache.flink:flink-metrics-core</exclude>
                                <exclude>org.apache.flink:flink-metrics-jmx</exclude>
                                <exclude>org.apache.flink:flink-statebackend-rocksdb_2.11</exclude>


                                <!-- Also exclude very big transitive dependencies of Flink
                                WARNING: You have to remove these excludes if your code relies on other
                                versions of these dependencies.
                                -->

                                <exclude>log4j:log4j</exclude>
                                <exclude>org.scala-lang:scala-library</exclude>
                                <exclude>org.scala-lang:scala-compiler</exclude>
                                <exclude>org.scala-lang:scala-reflect</exclude>
                                <exclude>com.data-artisans:flakka-actor_*</exclude>
                                <exclude>com.data-artisans:flakka-remote_*</exclude>
                                <exclude>com.data-artisans:flakka-slf4j_*</exclude>
                                <exclude>io.netty:netty-all</exclude>
                                <exclude>io.netty:netty</exclude>
                                <exclude>commons-fileupload:commons-fileupload</exclude>
                                <exclude>org.apache.avro:avro</exclude>
                                <exclude>commons-collections:commons-collections</exclude>
                                <exclude>org.codehaus.jackson:jackson-core-asl</exclude>
                                <exclude>org.codehaus.jackson:jackson-mapper-asl</exclude>
                                <exclude>com.thoughtworks.paranamer:paranamer</exclude>
                                <exclude>org.xerial.snappy:snappy-java</exclude>
                                <exclude>org.apache.commons:commons-compress</exclude>
                                <exclude>org.tukaani:xz</exclude>
                                <exclude>com.esotericsoftware.kryo:kryo</exclude>
                                <exclude>com.esotericsoftware.minlog:minlog</exclude>
                                <exclude>org.objenesis:objenesis</exclude>
                                <exclude>com.twitter:chill_*</exclude>
                                <exclude>com.twitter:chill-java</exclude>
                                <exclude>commons-lang:commons-lang</exclude>
                                <exclude>junit:junit</exclude>
                                <exclude>org.apache.commons:commons-lang3</exclude>
                                <exclude>org.slf4j:slf4j-api</exclude>
                                <exclude>org.slf4j:slf4j-log4j12</exclude>
                                <exclude>log4j:log4j</exclude>
                                <exclude>org.apache.commons:commons-math</exclude>
                                <exclude>org.apache.sling:org.apache.sling.commons.json</exclude>
                                <exclude>commons-logging:commons-logging</exclude>
                                <exclude>commons-codec:commons-codec</exclude>
                                <exclude>com.fasterxml.jackson.core:jackson-core</exclude>
                                <exclude>com.fasterxml.jackson.core:jackson-databind</exclude>
                                <exclude>com.fasterxml.jackson.core:jackson-annotations</exclude>
                                <exclude>stax:stax-api</exclude>
                                <exclude>com.typesafe:config</exclude>
                                <exclude>org.uncommons.maths:uncommons-maths</exclude>
                                <exclude>com.github.scopt:scopt_*</exclude>
                                <exclude>commons-io:commons-io</exclude>
                                <exclude>commons-cli:commons-cli</exclude>
                            </excludes>
                        </artifactSet>
                        <filters>
                            <filter>
                                <artifact>org.apache.flink:*</artifact>
                                <excludes>
                                    <!-- exclude shaded google but include shaded curator -->
                                    <exclude>org/apache/flink/shaded/com/**</exclude>
                                    <exclude>web-docs/**</exclude>
                                </excludes>
                            </filter>
                            <filter>
                                <!-- Do not copy the signatures in the META-INF folder.
                                Otherwise, this might cause SecurityExceptions when using the JAR. -->
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>com.ibm.cloud.flink.StreamingJob</mainClass>
                            </transformer>
                        </transformers>
                        <createDependencyReducedPom>false</createDependencyReducedPom>
                    </configuration>
                </execution>
            </executions>
        </plugin>

我正在使用S3 hadoop Flink fs库:

    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-s3-fs-hadoop</artifactId>
        <version>${flink.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-aws</artifactId>
        <version>2.7.2</version>
    </dependency>
    <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>aws-java-sdk</artifactId>
        <version>1.7.4</version>
    </dependency>
    <dependency>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpcore</artifactId>
        <version>4.2.5</version>
    </dependency>
    <dependency>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpclient</artifactId>
        <version>4.2.5</version>
    </dependency>

我的项目在github:https://github.com/ibm-cloud-streaming-retail-demo/flink-on-iae-messagehub-to-s3

我认为一组不同的罐子正在加载独立运行。有没有人见过这个问题?

1 个答案:

答案 0 :(得分:0)

首先,您可以使用以下命令生成Flink Quickstart项目:

mvn archetype:generate                               
    -DarchetypeGroupId=org.apache.flink             
    -DarchetypeArtifactId=flink-quickstart-java      
    -DarchetypeVersion=1.4.0

这比从现有项目复制pom.xml文件更清晰,更安全。

就我在Github上看到的而言,你的工作使用了BucketingSink。截至Flink 1.4.0,这个接收器直接依赖于Hadoop的FileSystem类[1]。 但是,您将flink-s3-fs-hadoop作为依赖项包含在内;这是一个Flink 具有阴影Hadoop依赖关系的文件系统,它不能使用当前的 BucketingSink实施。

您看到的异常暗示了类加载问题[2]。我猜测 您的用户jar中有Hadoop类,与之相冲突 来自Flink。你可以尝试删除你的所有Hadoop依赖项 pom.xml,或flink-conf.yaml中您可以尝试设置classloader.resolve-order: parent-first [3]。

[1] https://github.com/apache/flink/blob/release-1.4/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.java#L1118

[2] https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/debugging_classloading.html#x-cannot-be-cast-to-x-exceptions

[3] https://ci.apache.org/projects/flink/flink-docs-release-1.4/monitoring/debugging_classloading.html#configuring-classloader-resolution-order