我一直在IntelliJ上编写代码来运行“DirectKafkaWordCount.scala” 源代码来自Github的apache / spark(spark / example / src / main / scala / org / apache / spark / examples / streaming / DirectKafkaWordCount.scala),如下所示:
import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka010._
object KafkaWordCountTest {
def main(args: Array[String]) {
StreamingExamples.setStreamingLogLevels()
val brokers = "localhost:2181"
val topics = "test1"
// Create context with 2 second batch interval
val sparkConf = new SparkConf().setAppName("KafkaWordCountTest")
val ssc = new StreamingContext(sparkConf, Seconds(2))
// Create direct kafka stream with brokers and topics
val topicsSet = topics.split(",").toSet
val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers)
val messages = KafkaUtils.createDirectStream[String, String](
ssc,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe[String, String](topicsSet, kafkaParams))
// Get the lines, split them into words, count the words and print
val lines = messages.map(_.value)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1L)).reduceByKey(_ + _)
wordCounts.print()
// Start the computation
ssc.start()
ssc.awaitTermination()
}
}
另外, 我在pom.xml中添加了依赖项,如下所示:
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.11.8</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-twitter_2.11</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.0.2</version>
</dependency>
</dependencies>
但是,我收到了一条错误消息。
错误:(43,41)在类文件'KafkaUtils.class'中遇到org.apache.spark.internal的错误符号引用。无法访问org.apache.spark包中的term internal。当前类路径可能缺少org.apache.spark.internal的定义,或者KafkaUtils.class可能是针对与当前路径上找到的版本不兼容的版本编译的。 val message = KafkaUtils.createDirectStream [String,String](
我在Desktop文件夹上解压缩了“spark-1.5.2-bin-hadoop2.4”和“spark-2.2.0-bin-hadoop2.7”。使用spark-shell命令,每个火花在终端上运行良好。我最初安装了Spark 2.2.0版。而且,为了运行Kafka Direct示例,我还安装了Spark 1.5.2。这是一个问题吗?
如何解决这个问题?
谢谢!