我想使用Scala 2.10.6
和Spark 1.6.2
来使用来自Kafka主题的消息。对于Kafka,我正在使用这种依赖:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.6.2</version>
</dependency>
此代码编译良好,但我想定义auto.offset.reset
,此处出现问题:
val topicMap = topic.split(",").map((_, kafkaNumThreads.toInt)).toMap
val data = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap,
StorageLevel.MEMORY_AND_DISK_SER_2).map(_._2)
当我添加kafkaParams
时,它不再编译:
val kafkaParams = Map[String, String](
"zookeeper.connect" -> zkQuorum, "group.id" -> group,
"zookeeper.connection.timeout.ms" -> "10000",
"auto.offset.reset" -> "smallest")
val data = KafkaUtils.createStream(ssc, kafkaParams, topicMap,
StorageLevel.MEMORY_AND_DISK_SER_2).map(_._2)
错误讯息:
94: error: missing parameter type for expanded function ((x$3) => x$3._2)
[ERROR] StorageLevel.MEMORY_AND_DISK_SER_2).map(_._2)
我尝试了许多不同的createStream
参数组合,但一切都失败了。有人可以帮忙吗?
答案 0 :(得分:1)
您需要向KafkaUtils.createStream
添加类型参数,以便解析流的基础类型。例如,如果您的键和值属于String
类型:
val data: DStream[String] =
KafkaUtils
.createStream[String, String, StringDecoder, StringDecoder](
ssc,
kafkaParams,
topicMap,
StorageLevel.MEMORY_AND_DISK_SER_2
).map(_._2)