风暴卡夫卡鲸鱼喷水消耗缓慢

时间:2013-10-31 12:00:58

标签: apache-storm apache-kafka

我刚刚尝试了https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka中提到的kafka-storm spout,我使用的配置如下所示。

    BrokerHosts brokerHosts = KafkaConfig.StaticHosts.fromHostString(
            ImmutableList.of("localhost"), 1);
    SpoutConfig spoutConfig = new SpoutConfig(brokerHosts, // list of Kafka
            "test", // topic to read from
            "/kafkastorm", // the root path in Zookeeper for the spout to
            "discovery"); // an id for this consumer for storing the
                            // consumer offsets in Zookeeper
    spoutConfig.scheme = new StringScheme();
    spoutConfig.stateUpdateIntervalMs = 1000;


    KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);

    TridentTopology topology = new TridentTopology();
    InetSocketAddress inetSocketAddress = new InetSocketAddress(
            "localhost", 6379);
    TridentState wordsCount = topology
            .newStream(SPOUT_FIRST, kafkaSpout)
            .parallelismHint(1)
            .each(new Fields("str"), new TestSplit(), new Fields("words"))
            .groupBy(new Fields("words"))
            .persistentAggregate(
                    RedisState.transactional(inetSocketAddress),
                    new Count(), new Fields("counts")).parallelismHint(100);

    Config conf = new Config();
    conf.setMaxTaskParallelism(200);
    // conf.setDebug( true );
    // conf.setMaxSpoutPending(20);

    // This topology can only be run as local because it is a toy example
    LocalDRPC drpc = new LocalDRPC();
    LocalCluster cluster = new LocalCluster();
    cluster.submitTopology("symbolCounter", conf, topology.build());

但是上面的spout从Kafka主题获取消息的速度大约是7000 /秒,但我预计每秒会加载大约50000条消息。我已经尝试了在spoutConfig中增加获取缓冲区大小的各种选项,但没有可见的结果。

是否有任何类似的问题,他无法通过风暴以生产者生成消息的速度获取kafka主题?

1 个答案:

答案 0 :(得分:3)

我将config中的“topology.spout.max.batch.size”值更新为大约64 * 1024值,然后风暴处理变得很快。