Flink Kafka源时间戳提取器

时间:2018-03-05 14:01:29

标签: scala classpath apache-flink flink-streaming

我正在尝试基于 flink:1.4.1-hadoop27-scala_2.11-alpine 图像将Flink作业部署到群集。这项工作是使用Kafka连接器源(flink-connector-kafka-0.11),我正在尝试分配时间戳和水印。我的代码与Flink Kafka connector documentation中的Scala示例非常相似。但是使用FlinkKafkaConsumer011

val myConsumer = new FlinkKafkaConsumer08[String]("topic", new SimpleStringSchema(), properties)
myConsumer.assignTimestampsAndWatermarks(new CustomWatermarkEmitter())

从我的IDE本地运行时,这非常有用。但是,在群集环境中,我收到以下错误:

java.lang.ClassNotFoundException: com.my.organization.CustomWatermarkEmitter
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:73)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1863)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1746)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2037)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:393)
at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:380)
at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:368)
at org.apache.flink.util.SerializedValue.deserializeValue(SerializedValue.java:58)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.createPartitionStateHolders(AbstractFetcher.java:521)
at org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.<init>(AbstractFetcher.java:167)
at org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.<init>(Kafka09Fetcher.java:89)
at org.apache.flink.streaming.connectors.kafka.internal.Kafka010Fetcher.<init>(Kafka010Fetcher.java:62)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.createFetcher(FlinkKafkaConsumer010.java:203)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:564)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:86)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:94)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:264)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:748)

我正在建立我的工作作为一个胖罐,我已经验证包含这个类。 仅当CustomWatermarkEmitter类位于/ opt / flink / lib /文件夹中时,文档中的此示例才有效吗?

这是我必须解决问题的方法。但是必须单独构建这个类并将它放在/ opt / flink / lib中会使我的构建过程变得非常复杂,所以我想知道这是否应该被解决,或者是否还有其他解决方法?

例如Flink documentation中的这一部分提示必须手动提供一些UserCodeClassLoader来源?包括提供的Kafka来源?

在org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher中我似乎在内部使用了“userCodeClassLoader”:

            case PERIODIC_WATERMARKS: {
            for (Map.Entry<KafkaTopicPartition, Long> partitionEntry : partitionsToInitialOffsets.entrySet()) {
                KPH kafkaHandle = createKafkaPartitionHandle(partitionEntry.getKey());

                AssignerWithPeriodicWatermarks<T> assignerInstance =
                        watermarksPeriodic.deserializeValue(userCodeClassLoader);

                KafkaTopicPartitionStateWithPeriodicWatermarks<T, KPH> partitionState =
                        new KafkaTopicPartitionStateWithPeriodicWatermarks<>(
                                partitionEntry.getKey(),
                                kafkaHandle,
                                assignerInstance);

                partitionState.setOffset(partitionEntry.getValue());

                partitionStates.add(partitionState);
            }

修改

我创建了一个简单的项目,可以在这里重现此问题: https://github.com/lragnarsson/flink-kafka-classpath-problem

为了重现,你需要docker和docker-compose。

只是这样做:

  1. git clone https://github.com/lragnarsson/flink-kafka-classpath-problem.git
  2. cd flink-kafka-classpath-problem / docker
  3. docker-compose build
  4. docker-compose up
  5. 在浏览器中访问localhost:8081
  6. 提交目标/ scala-2.11 / flink-kafka-classpath-problem-assembly-0.1-SNAPSHOT.jar中包含的jar文件
  7. 这应该导致异常 java.lang.ClassNotFoundException:se.ragnarsson.lage.MyTimestampExtractor

1 个答案:

答案 0 :(得分:1)

我认为您偶然发现了Flink 1.4.1中引入的错误:https://issues.apache.org/jira/browse/FLINK-8741

将在1.4.2中修复。您可以尝试在1.4.2.rc2上进行测试:https://github.com/apache/flink/tree/release-1.4.2-rc2