如何不断向kafka发送数据?

时间:2016-03-09 17:45:40

标签: python python-2.7 apache-kafka tshark pyshark

我正在尝试不断向kafka经纪人/消费者发送数据(使用tshark嗅探数据包)。

以下是我遵循的步骤:

1。已启动 zookeeper:

kafka/bin/zookeeper-server-start.sh ../kafka//config/zookeeper.properties

2。已启动 kafka服务器

kafka/bin/kafka-server-start.sh ../kafka/config/server.properties

3。已启动 kafka使用者

kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic \
                                           'my-topic' --from-beginning

4. 写下以下python脚本将嗅探数据发送给消费者:

from kafka import KafkaProducer
import subprocess
producer = KafkaProducer(bootstrap_servers='localhost:9092')
producer.send('my-topic', subprocess.check_output(['tshark','-i','wlan0']))

但这是留在procuder终端并输出:

Capturing on 'wlan0'
605
^C

没有任何东西转移给消费者。

我知道我可以使用pyshark在python上实现tshark:

import pyshark
capture = pyshark.LiveCapture(interface='eth0')
capture.sniff(timeout=5)
capture1=capture[0]
print capture1

但我不知道如何将捕获的数据包从生产者连续发送给消费者。有什么建议吗?

谢谢!

1 个答案:

答案 0 :(得分:0)

检查以下链接。

http://zdatainc.com/2014/07/real-time-streaming-apache-storm-apache-kafka/

实施Kafka制作人 这里,定义了用于测试我们的集群的Kafka生成器代码的主要部分。 在主类中,我们设置数据管道和线程:

LOGGER.debug("Setting up streams");
PipedInputStream send = new PipedInputStream(BUFFER_LEN);
PipedOutputStream input = new PipedOutputStream(send);

LOGGER.debug("Setting up connections");
LOGGER.debug("Setting up file reader");
BufferedFileReader reader = new BufferedFileReader(filename, input);
LOGGER.debug("Setting up kafka producer");
KafkaProducer kafkaProducer = new KafkaProducer(topic, send);

LOGGER.debug("Spinning up threads");
Thread source = new Thread(reader);
Thread kafka = new Thread(kafkaProducer);

source.start();
kafka.start();

LOGGER.debug("Joining");
kafka.join();
The BufferedFileReader in its own thread reads off the data from disk:
rd = new BufferedReader(new FileReader(this.fileToRead));
wd = new BufferedWriter(new OutputStreamWriter(this.outputStream, ENC));
int b = -1;
while ((b = rd.read()) != -1)
{
    wd.write(b);
}
Finally, the KafkaProducer sends asynchronous messages to the Kafka Cluster:
rd = new BufferedReader(new InputStreamReader(this.inputStream, ENC));
String line = null;
producer = new Producer<Integer, String>(conf);
while ((line = rd.readLine()) != null)
{
    producer.send(new KeyedMessage<Integer, String>(this.topic, line));
}
Doing these operations on separate threads gives us the benefit of having disk reads not block the Kafka producer or vice-versa, enabling maximum performance tunable by the size of the buffer.
Implementing the Storm Topology
Topology Definition
Moving onward to Storm, here we define the topology and how each bolt will be talking to each other:
TopologyBuilder topology = new TopologyBuilder();

topology.setSpout("kafka_spout", new KafkaSpout(kafkaConf), 4);

topology.setBolt("twitter_filter", new TwitterFilterBolt(), 4)
        .shuffleGrouping("kafka_spout");

topology.setBolt("text_filter", new TextFilterBolt(), 4)
        .shuffleGrouping("twitter_filter");

topology.setBolt("stemming", new StemmingBolt(), 4)
        .shuffleGrouping("text_filter");

topology.setBolt("positive", new PositiveSentimentBolt(), 4)
        .shuffleGrouping("stemming");
topology.setBolt("negative", new NegativeSentimentBolt(), 4)
        .shuffleGrouping("stemming");

topology.setBolt("join", new JoinSentimentsBolt(), 4)
        .fieldsGrouping("positive", new Fields("tweet_id"))
        .fieldsGrouping("negative", new Fields("tweet_id"));

topology.setBolt("score", new SentimentScoringBolt(), 4)
        .shuffleGrouping("join");

topology.setBolt("hdfs", new HDFSBolt(), 4)
        .shuffleGrouping("score");
topology.setBolt("nodejs", new NodeNotifierBolt(), 4)
        .shuffleGrouping("score");

值得注意的是,数据被拖曳到每个螺栓,除非在连接时,因为将相同的推文赋予连接螺栓的相同实例非常重要。