未能在kafka-storm中将偏移数据写入zookeeper

时间:2014-06-25 11:58:23

标签: bigdata apache-zookeeper apache-storm apache-kafka

我正在设置一个风暴群来计算实时趋势和其他统计数据,但是我通过允许kafka-spout最后读取的偏移量将“恢复”功能引入此项目时遇到了一些问题(要记住kafka-spout的源代码来自https://github.com/apache/incubator-storm/tree/master/external/storm-kafka)。我以这种方式开始kafka-spout

BrokerHosts zkHost = new ZkHosts("localhost:2181");
SpoutConfig kafkaConfig = new SpoutConfig(zkHost, "test", "", "test");
kafkaConfig.forceFromStart = false;
KafkaSpout kafkaSpout = new KafkaSpout(kafkaConfig);
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("test" + "spout", kafkaSpout, ESConfig.spoutParallelism);

默认设置应该是这样做的,但我认为在我的情况下没有这样做,每次我启动项目时,PartitionManager都会尝试查找带偏移的文件,然后找不到任何内容:

2014-06-25 11:57:08 INFO  PartitionManager:73 - Read partition information from: /storm/partition_1  --> null
2014-06-25 11:57:08 INFO  PartitionManager:86 - No partition information found, using configuration to determine offset

然后它开始从最新的可能偏移读取。如果我的项目永远不会失败,那就没关系,但不完全是我想要的。

我还看了PartitionManager类的Zkstate类,它使用public void commit() { long lastCompletedOffset = lastCompletedOffset(); if (_committedTo != lastCompletedOffset) { LOG.debug("Writing last completed offset (" + lastCompletedOffset + ") to ZK for " + _partition + " for topology: " + _topologyInstanceId); Map<Object, Object> data = (Map<Object, Object>) ImmutableMap.builder() .put("topology", ImmutableMap.of("id", _topologyInstanceId, "name", _stormConf.get(Config.TOPOLOGY_NAME))) .put("offset", lastCompletedOffset) .put("partition", _partition.partition) .put("broker", ImmutableMap.of("host", _partition.host.host, "port", _partition.host.port)) .put("topic", _spoutConfig.topic).build(); _state.writeJSON(committedPath(), data); _committedTo = lastCompletedOffset; LOG.debug("Wrote last completed offset (" + lastCompletedOffset + ") to ZK for " + _partition + " for topology: " + _topologyInstanceId); } else { LOG.debug("No new offset for " + _partition + " for topology: " + _topologyInstanceId); } } 类来编写偏移量,从这段代码片段开始:

PartitionManeger

public void writeBytes(String path, byte[] bytes) {
    try {
        if (_curator.checkExists().forPath(path) == null) {
            _curator.create()
                    .creatingParentsIfNeeded()
                    .withMode(CreateMode.PERSISTENT)
                    .forPath(path, bytes);
        } else {
            _curator.setData().forPath(path, bytes);
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

ZkState

writeBytes

我可以看到,对于第一条消息,if方法进入else块并尝试创建路径,然后第二条消息进入partition information块,这似乎没问题。但是当我再次启动项目时,会显示上面提到的相同消息。找不到{{1}}。

2 个答案:

答案 0 :(得分:10)

我遇到了同样的问题。原来我在本地模式下运行,它使用内存zookeeper而不是Kafka正在使用的zookeeper。

为了确保KafkaSpout不会将Storm的ZooKeeper用于存储偏移的ZkState,您需要设置SpoutConfig.zkServersSpoutConfig.zkPortSpoutConfig.zkRoot in ZkHosts的补充。例如

import org.apache.zookeeper.client.ConnectStringParser;
import storm.kafka.SpoutConfig;
import storm.kafka.ZkHosts;
import storm.kafka.KeyValueSchemeAsMultiScheme;

...

    final ConnectStringParser connectStringParser = new ConnectStringParser(zkConnectStr);
    final List<InetSocketAddress> serverInetAddresses = connectStringParser.getServerAddresses();
    final List<String> serverAddresses = new ArrayList<>(serverInetAddresses.size());
    final Integer zkPort = serverInetAddresses.get(0).getPort();
    for (InetSocketAddress serverInetAddress : serverInetAddresses) {
        serverAddresses.add(serverInetAddress.getHostName());
    }

    final ZkHosts zkHosts = new ZkHosts(zkConnectStr);
    zkHosts.brokerZkPath = kafkaZnode + zkHosts.brokerZkPath;

    final SpoutConfig spoutConfig = new SpoutConfig(zkHosts, inputTopic, kafkaZnode, kafkaConsumerGroup);
    spoutConfig.scheme = new KeyValueSchemeAsMultiScheme(inputKafkaKeyValueScheme);

    spoutConfig.zkServers = serverAddresses;
    spoutConfig.zkPort = zkPort;
    spoutConfig.zkRoot = kafkaZnode;

答案 1 :(得分:0)

我认为你正在犯这个错误:

https://community.hortonworks.com/questions/66524/closedchannelexception-kafka-spout-cannot-read-kaf.html

以上同事的评论解决了我的问题。我添加了一些较新的库。