最近我们转向HDP 2.5,其中包括Kafka 0.10.0和Spark 1.6.2。所以我修改了我的pom和一些API以使用新的Kafka。我可以运行代码,但是我没有看到任何消息。我在下面添加了一个代码片段。我也贴了我的pom。我不确定这里出了什么问题。有人可以请帮助。
SparkConf conf = new SparkConf().setMaster("local[2]").setAppName(
"SparkApp");
JavaStreamingContext jssc = new JavaStreamingContext(conf,
Durations.seconds(2));
Map<String, Integer> topicMap = new HashMap<String, Integer>();
topicMap.put(this.topic, this.numThreads);
Map<String, String> kafkaParams = new HashMap<>();
kafkaParams.put("metadata.broker.list", kfkBroker);
kafkaParams.put("zookeeper.connect", zkBroker);
kafkaParams.put("group.id", "default");
kafkaParams.put("fetch.message.max.bytes", "60000000");
JavaPairReceiverInputDStream<String, String> kafkaInStream = KafkaUtils.createStream(
jssc,
String.class,
String.class,
kafka.serializer.StringDecoder.class,
kafka.serializer.StringDecoder.class,
kafkaParams,
topicMap,
StorageLevel.MEMORY_AND_DISK());
kafkaInStream.foreachRDD(new VoidFunction<JavaPairRDD<String, String>>()
{
/**
*
*/
private static final long serialVersionUID = 1L;
public void call(JavaPairRDD<String, String> v1) throws Exception
{
System.out.println("inside call.. JavaPairRDD size " + v1.count());
for (Tuple2<String, String> test : v1.collect())
{
this.eventMessage.setMessage(test._2);
}
}
});
我得到一个输出“内部调用.. JavaPairRDD大小0”总是表示spark没有读取任何数据。我尝试通过控制台消费者将一些数据推送到主题中。但这没有帮助。
这是我的pom.xml(仅添加了依赖项)
<dependencies>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.10.1.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka_2.10 -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.10.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.10</artifactId>
<version>1.6.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20160810</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.101tec</groupId>
<artifactId>zkclient</artifactId>
<version>0.8</version>
</dependency>
</dependencies>
答案 0 :(得分:1)
--packages
仅适用于Kafka 0.8+客户端。您仍然可以使用Kafka 0.8+客户端连接到0.10+群集,但会失去一些性能。
我建议您只使用bin/spark-submit --packages org.apache.spark:spark-streaming-kafka_2.10:1.6.2 ...
提交您的应用程序,以避免在依赖项中设置Kafka。如,
{{1}}