如何使用JavaInputDStream进行直接流修复Kafka-Spark流错误?

时间:2019-05-21 12:22:10

标签: java apache-spark apache-kafka spark-streaming-kafka

我在https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html中使用Direct Stream进行了简单的Kafka-Spark流传输,并且我将其编译为单个Java文件(没有maven),并分别处理所有依赖项。像这样编译时:

javac -cp "/opt/spark-2.4.3-bin-hadoop2.7/jars/*:/opt/kafka_2.11-2.2.0/libs/*" SparkStreamConsumer.java

出现错误:

SparkStreamConsumer.java:33: error: incompatible types: no instance(s) of type variable(s) K,V exist so that InputDStream<ConsumerRecord<K,V>> conforms to JavaInputDStream<ConsumerRecord<String,String>>
                KafkaUtils.createDirectStream(
                                             ^
  where K,V are type-variables:
    K extends Object declared in method <K,V>createDirectStream(StreamingContext,LocationStrategy,ConsumerStrategy<K,V>)
    V extends Object declared in method <K,V>createDirectStream(StreamingContext,LocationStrategy,ConsumerStrategy<K,V>)
1 error

请帮助!

此外,除了/opt/spark-2.4.3-bin-hadoop2.7/jars/中存在的文件之外,我还添加了:spark-streaming-kafka-0-10_2.11-2.4.3.jar是从https://search.maven.org/remotecontent?filepath=org/apache/spark/spark-streaming-kafka-0-10_2.11/2.4.3/spark-streaming-kafka-0-10_2.11-2.4.3.jar下载的

我所做的事情:

->设置Zookeeper(3.4.14)

->设置Kafka(kafka_2.11-2.2.0)

->设置Spark(2.4.3)

->制作了一个kafka主题“ mytopic”

->使用控制台生产者和使用者进行了测试,可以正常工作。

现在我想让Spark来做耗费工作,但是错误却不让我做!

代码:

//SparkStreamConsumer.java

import org.apache.spark.streaming.StreamingContext;
import org.apache.spark.streaming.Durations;
import org.apache.spark.SparkConf;
import org.apache.spark.TaskContext;
import org.apache.spark.api.java.*;
import org.apache.spark.api.java.function.*;
import org.apache.spark.streaming.api.java.*;
import org.apache.spark.streaming.kafka010.*;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.util.*;
import scala.Tuple2;

public class SparkStreamConsumer{
    public static void main(String[] args){

        SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("KafkaReceiverInJava");
        StreamingContext ssc = new StreamingContext(conf, Durations.seconds(1));

        Map<String, Object> kafkaParams = new HashMap<>();
        kafkaParams.put("bootstrap.servers", "localhost:2181");
        kafkaParams.put("key.deserializer", StringDeserializer.class);
        kafkaParams.put("value.deserializer", StringDeserializer.class);
        kafkaParams.put("group.id", "spark-streaming-consumer-group");
        kafkaParams.put("auto.offset.reset", "latest");
        kafkaParams.put("enable.auto.commit", true);

        Collection<String> topics = Arrays.asList("mytopic");


        //The problematic line:
        JavaInputDStream<ConsumerRecord<String, String>> kafkaStream =
                KafkaUtils.createDirectStream(
                        ssc,
                        LocationStrategies.PreferConsistent(),
                        ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)
                );

        kafkaStream.mapToPair(record -> new Tuple2<>(record.key(), record.value()));

        kafkaStream.print();
        ssc.start();
        ssc.awaitTermination();
    }
}

谢谢!

0 个答案:

没有答案