我在https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html中使用Direct Stream进行了简单的Kafka-Spark流传输,并且我将其编译为单个Java文件(没有maven),并分别处理所有依赖项。像这样编译时:
javac -cp "/opt/spark-2.4.3-bin-hadoop2.7/jars/*:/opt/kafka_2.11-2.2.0/libs/*" SparkStreamConsumer.java
,
出现错误:
SparkStreamConsumer.java:33: error: incompatible types: no instance(s) of type variable(s) K,V exist so that InputDStream<ConsumerRecord<K,V>> conforms to JavaInputDStream<ConsumerRecord<String,String>>
KafkaUtils.createDirectStream(
^
where K,V are type-variables:
K extends Object declared in method <K,V>createDirectStream(StreamingContext,LocationStrategy,ConsumerStrategy<K,V>)
V extends Object declared in method <K,V>createDirectStream(StreamingContext,LocationStrategy,ConsumerStrategy<K,V>)
1 error
请帮助!
此外,除了/opt/spark-2.4.3-bin-hadoop2.7/jars/
中存在的文件之外,我还添加了:spark-streaming-kafka-0-10_2.11-2.4.3.jar
是从https://search.maven.org/remotecontent?filepath=org/apache/spark/spark-streaming-kafka-0-10_2.11/2.4.3/spark-streaming-kafka-0-10_2.11-2.4.3.jar下载的
我所做的事情:
->设置Zookeeper(3.4.14)
->设置Kafka(kafka_2.11-2.2.0)
->设置Spark(2.4.3)
->制作了一个kafka主题“ mytopic”
->使用控制台生产者和使用者进行了测试,可以正常工作。
现在我想让Spark来做耗费工作,但是错误却不让我做!
代码:
//SparkStreamConsumer.java
import org.apache.spark.streaming.StreamingContext;
import org.apache.spark.streaming.Durations;
import org.apache.spark.SparkConf;
import org.apache.spark.TaskContext;
import org.apache.spark.api.java.*;
import org.apache.spark.api.java.function.*;
import org.apache.spark.streaming.api.java.*;
import org.apache.spark.streaming.kafka010.*;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.util.*;
import scala.Tuple2;
public class SparkStreamConsumer{
public static void main(String[] args){
SparkConf conf = new SparkConf().setMaster("local[*]").setAppName("KafkaReceiverInJava");
StreamingContext ssc = new StreamingContext(conf, Durations.seconds(1));
Map<String, Object> kafkaParams = new HashMap<>();
kafkaParams.put("bootstrap.servers", "localhost:2181");
kafkaParams.put("key.deserializer", StringDeserializer.class);
kafkaParams.put("value.deserializer", StringDeserializer.class);
kafkaParams.put("group.id", "spark-streaming-consumer-group");
kafkaParams.put("auto.offset.reset", "latest");
kafkaParams.put("enable.auto.commit", true);
Collection<String> topics = Arrays.asList("mytopic");
//The problematic line:
JavaInputDStream<ConsumerRecord<String, String>> kafkaStream =
KafkaUtils.createDirectStream(
ssc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String>Subscribe(topics, kafkaParams)
);
kafkaStream.mapToPair(record -> new Tuple2<>(record.key(), record.value()));
kafkaStream.print();
ssc.start();
ssc.awaitTermination();
}
}
谢谢!