Question

有人可以提供示例代码，以便从Spark Streaming将记录推送到Kafka吗？

Answer 1

使用Spark Streaming，您可以使用Kafka主题中的数据。

如果您想将记录发布到Kafka主题，可以使用Kafka Producer [https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example]

或者您可以使用Kafka Connect使用多个源连接器将数据发布到Kafka主题。[http://www.confluent.io/product/connectors/]

有关Spark流媒体和Kafka集成的更多信息，请参阅以下链接。

http://spark.apache.org/docs/latest/streaming-kafka-integration.html

Answer 2

I have done it using Java. You can use this function over a JavaDStream<String> as the argument for .foreachRDD(). It is not the best way as it creates a KafkaProducer for each RDD, you can do this using a "pool" of KafkaProducers like the socket example in Spark documentation.

Here is my code:

public static class KafkaPublisher implements VoidFunction<JavaRDD<String>> {
    private static final long serialVersionUID = 1L;

    public void call(JavaRDD<String> rdd) throws Exception {
        Properties props = new Properties();
        props.put("bootstrap.servers", "loca192.168.0.155lhost:9092");
        props.put("acks", "1");
        props.put("retries", 0);
        props.put("batch.size", 16384);
        props.put("linger.ms", 1000);
        props.put("buffer.memory", 33554432);
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");

        rdd.foreachPartition(new VoidFunction<Iterator<String>>() {
            private static final long serialVersionUID = 1L;

            public void call(Iterator<String> partitionOfRecords) throws Exception {
                Producer<String, String> producer = new KafkaProducer<>(props);
                while(partitionOfRecords.hasNext()) {
                    producer.send(new ProducerRecord<String, String>("topic", partitionOfRecords.next()));
                }
                producer.close();
            }
        });
    }
}

从Spark Streaming将数据推送到Kafka

2 个答案: