我可以毫无问题地从IntelliJ运行我的Flink作业,但是当我尝试在flink独立运行它时...
wget ... flink-1.4.0-bin-hadoop27-scala_2.11.tgz
tar xf flink-1.4.0-bin-hadoop27-scala_2.11.tgz
./flink-1.4.0/bin/start-local.sh
./flink-1.4.0/bin/flink run ../mypath/target/messagehub-to-s3-1.0-SNAPSHOT.jar ...
我收到错误消息:
Caused by: java.lang.ClassCastException: org.apache.hadoop.fs.s3a.S3AFileSystem cannot be cast to org.apache.hadoop.fs.FileSystem
at org.apache.flink.runtime.fs.hdfs.HadoopFsFactory.create(HadoopFsFactory.java:112)
我使用maven shade插件(基于dataArtisans example)来创建我的jar文件:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.1</version>
<executions>
<!-- Run shade goal on package phase -->
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<artifactSet>
<excludes>
<!-- This list contains all dependencies of flink-dist
Everything else will be packaged into the fat-jar
-->
<exclude>org.apache.flink:flink-annotations</exclude>
<exclude>org.apache.flink:flink-shaded-curator-recipes</exclude>
<exclude>org.apache.flink:flink-core</exclude>
<exclude>org.apache.flink:flink-java</exclude>
<exclude>org.apache.flink:flink-scala_2.11</exclude>
<exclude>org.apache.flink:flink-runtime_2.11</exclude>
<exclude>org.apache.flink:flink-optimizer_2.11</exclude>
<exclude>org.apache.flink:flink-clients_2.11</exclude>
<exclude>org.apache.flink:flink-avro_2.11</exclude>
<exclude>org.apache.flink:flink-examples-batch_2.11</exclude>
<exclude>org.apache.flink:flink-examples-streaming_2.11</exclude>
<exclude>org.apache.flink:flink-streaming-java_2.11</exclude>
<exclude>org.apache.flink:flink-streaming-scala_2.11</exclude>
<exclude>org.apache.flink:flink-scala-shell_2.11</exclude>
<exclude>org.apache.flink:flink-python</exclude>
<exclude>org.apache.flink:flink-metrics-core</exclude>
<exclude>org.apache.flink:flink-metrics-jmx</exclude>
<exclude>org.apache.flink:flink-statebackend-rocksdb_2.11</exclude>
<!-- Also exclude very big transitive dependencies of Flink
WARNING: You have to remove these excludes if your code relies on other
versions of these dependencies.
-->
<exclude>log4j:log4j</exclude>
<exclude>org.scala-lang:scala-library</exclude>
<exclude>org.scala-lang:scala-compiler</exclude>
<exclude>org.scala-lang:scala-reflect</exclude>
<exclude>com.data-artisans:flakka-actor_*</exclude>
<exclude>com.data-artisans:flakka-remote_*</exclude>
<exclude>com.data-artisans:flakka-slf4j_*</exclude>
<exclude>io.netty:netty-all</exclude>
<exclude>io.netty:netty</exclude>
<exclude>commons-fileupload:commons-fileupload</exclude>
<exclude>org.apache.avro:avro</exclude>
<exclude>commons-collections:commons-collections</exclude>
<exclude>org.codehaus.jackson:jackson-core-asl</exclude>
<exclude>org.codehaus.jackson:jackson-mapper-asl</exclude>
<exclude>com.thoughtworks.paranamer:paranamer</exclude>
<exclude>org.xerial.snappy:snappy-java</exclude>
<exclude>org.apache.commons:commons-compress</exclude>
<exclude>org.tukaani:xz</exclude>
<exclude>com.esotericsoftware.kryo:kryo</exclude>
<exclude>com.esotericsoftware.minlog:minlog</exclude>
<exclude>org.objenesis:objenesis</exclude>
<exclude>com.twitter:chill_*</exclude>
<exclude>com.twitter:chill-java</exclude>
<exclude>commons-lang:commons-lang</exclude>
<exclude>junit:junit</exclude>
<exclude>org.apache.commons:commons-lang3</exclude>
<exclude>org.slf4j:slf4j-api</exclude>
<exclude>org.slf4j:slf4j-log4j12</exclude>
<exclude>log4j:log4j</exclude>
<exclude>org.apache.commons:commons-math</exclude>
<exclude>org.apache.sling:org.apache.sling.commons.json</exclude>
<exclude>commons-logging:commons-logging</exclude>
<exclude>commons-codec:commons-codec</exclude>
<exclude>com.fasterxml.jackson.core:jackson-core</exclude>
<exclude>com.fasterxml.jackson.core:jackson-databind</exclude>
<exclude>com.fasterxml.jackson.core:jackson-annotations</exclude>
<exclude>stax:stax-api</exclude>
<exclude>com.typesafe:config</exclude>
<exclude>org.uncommons.maths:uncommons-maths</exclude>
<exclude>com.github.scopt:scopt_*</exclude>
<exclude>commons-io:commons-io</exclude>
<exclude>commons-cli:commons-cli</exclude>
</excludes>
</artifactSet>
<filters>
<filter>
<artifact>org.apache.flink:*</artifact>
<excludes>
<!-- exclude shaded google but include shaded curator -->
<exclude>org/apache/flink/shaded/com/**</exclude>
<exclude>web-docs/**</exclude>
</excludes>
</filter>
<filter>
<!-- Do not copy the signatures in the META-INF folder.
Otherwise, this might cause SecurityExceptions when using the JAR. -->
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.ibm.cloud.flink.StreamingJob</mainClass>
</transformer>
</transformers>
<createDependencyReducedPom>false</createDependencyReducedPom>
</configuration>
</execution>
</executions>
</plugin>
我正在使用S3 hadoop Flink fs库:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-s3-fs-hadoop</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.7.4</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
<version>4.2.5</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.2.5</version>
</dependency>
我的项目在github:https://github.com/ibm-cloud-streaming-retail-demo/flink-on-iae-messagehub-to-s3
我认为一组不同的罐子正在加载独立运行。有没有人见过这个问题?
答案 0 :(得分:0)
首先,您可以使用以下命令生成Flink Quickstart项目:
mvn archetype:generate
-DarchetypeGroupId=org.apache.flink
-DarchetypeArtifactId=flink-quickstart-java
-DarchetypeVersion=1.4.0
这比从现有项目复制pom.xml
文件更清晰,更安全。
就我在Github上看到的而言,你的工作使用了BucketingSink。截至Flink
1.4.0,这个接收器直接依赖于Hadoop的FileSystem类[1]。
但是,您将flink-s3-fs-hadoop
作为依赖项包含在内;这是一个Flink
具有阴影Hadoop依赖关系的文件系统,它不能使用当前的
BucketingSink实施。
您看到的异常暗示了类加载问题[2]。我猜测
您的用户jar中有Hadoop类,与之相冲突
来自Flink。你可以尝试删除你的所有Hadoop依赖项
pom.xml,或flink-conf.yaml
中您可以尝试设置classloader.resolve-order: parent-first
[3]。