kafka - >风暴 - > flink:意外的块数据

时间:2015-10-25 12:51:26

标签: java apache-storm apache-flink

我将拓扑从风暴移动到flink。拓扑已减少到KafkaSpout->Bolt。螺栓只计算数据包而不是尝试解码它们。

已编译的.jar通过flink -c <entry point> <path to .jar>提交到flink并发出以下错误:

java.lang.Exception: Call to registerInputOutput() of invokable failed
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:529)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.flink.streaming.runtime.tasks.StreamTaskException: Cannot instantiate user function.
        at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:190)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.registerInputOutput(StreamTask.java:174)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:526)
        ... 1 more
Caused by: java.io.StreamCorruptedException: unexpected block data
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1365)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1993)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1918)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
        at org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:294)
        at org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:255)
        at org.apache.flink.streaming.api.graph.StreamConfig.getStreamOperator(StreamConfig.java:175)
        ... 3 more

我的问题:

  1. 我是否错过了 KafkaSpout 的配置步骤?这在香草风暴中使用时起作用。
  2. 我需要使用特定版本的风暴库吗?我的构建包括0.9.4。
  3. 我可能错过的其他东西?
  4. 我应该使用风暴 KafkaSpout 还是使用flink KafkaSource编写自己的风格会更好?

    编辑:

    以下是相关的代码:

    拓扑结构:

    BrokerHosts brokerHosts = new ZkHosts(configuration.getString("kafka/zookeeper"));
    
    SpoutConfig kafkaConfig = new SpoutConfig(brokerHosts, configuration.getString("kafka/topic"), "/storm_env_values", "storm_env_DEBUG");
    FlinkTopologyBuilder builder = new FlinkTopologyBuilder();
    builder.setSpout("environment", new KafkaSpout(kafkaConfig), 1);
    builder.setBolt("decode_bytes", new EnvironmentBolt(), 1).shuffleGrouping("environment");
    

    初始化:

    FlinkLocalCluster cluster = new FlinkLocalCluster(); // replaces: LocalCluster cluster = new LocalCluster();
    cluster.submitTopology("env_topology", conf, buildTopology());
    

    螺栓基于BaseRichBolt execute() fn只记录任何数据包的存在以进行调试。那里没有其他代码。

1 个答案:

答案 0 :(得分:1)

I just had look at this. There is one issues right now but I got it working locally. You can apply this hot fixed to your code and build the compatibility layer by yourself.

  1. KafkaSpout registers metrics. However, metrics are currently not supported by the compatibility layer. You need to remove the exception in FlinkTopologyContext.registerMetric(...) and just return null. (There is already a open PR that work on the integration of metrics, thus I don't want to push this hot fix into master branch)
  2. Furhtermore, you need to add some configuration parameters to your query manually:

I just made up some values here:

Config c = new Config();
List<String> zkServers = new ArrayList<String>();
zkServers.add("localhost");
c.put(Config.STORM_ZOOKEEPER_SERVERS, zkServers);
c.put(Config.STORM_ZOOKEEPER_PORT, 2181);
c.put(Config.STORM_ZOOKEEPER_SESSION_TIMEOUT, 30);
c.put(Config.STORM_ZOOKEEPER_CONNECTION_TIMEOUT, 30);
c.put(Config.STORM_ZOOKEEPER_RETRY_TIMES, 3);
c.put(Config.STORM_ZOOKEEPER_RETRY_INTERVAL, 5);
  1. You need to add some additional dependencies to your project:

Additionally to flink-storm you need:

<dependency>
    <groupId>org.apache.storm</groupId>
    <artifactId>storm-kafka</artifactId>
    <version>0.9.4</version>
</dependency>
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka_2.10</artifactId>
    <version>0.8.1.1</version>
</dependency>

This works for me, using Kafka_2.10-0.8.1.1 and FlinkLocalCluster execute within Eclipse.

It also works in a local Flink cluster started via bin/start-local-streaming.sh. For this, using bin/flink run command, you need to use FlinkSubmitter instead of FlinkLocalCluster. Furthermore, you need the following dependencies for your jar:

<include>org.apache.storm:storm-kafka</include>
<include>org.apache.kafka:kafka_2.10</include>
<include>org.apache.curator:curator-client</include>
<include>org.apache.curator:curator-framework</include>
<include>com.google.guava:guava</include>
<include>com.yammer.metrics:metrics-core</include>