Spark和Kafka直接接近

时间:2016-05-04 11:52:56

标签: java apache-spark apache-kafka spark-streaming

我是Apache Spark的新手,我正在尝试运行Spark Streaming + Kafka Integration Direct Approach示例(JavaDirectKafkaWordCount.java)。

我已经下载了所有库但是当我尝试运行时出现此错误

Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at kafka.api.RequestKeys$.<init>(RequestKeys.scala:48)
at kafka.api.RequestKeys$.<clinit>(RequestKeys.scala)
at kafka.api.TopicMetadataRequest.<init>(TopicMetadataRequest.scala:55)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:122)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112)
at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:211)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:607)
at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
at it.unimi.di.luca.SimpleApp.main(SimpleApp.java:53)

有什么建议吗?

2 个答案:

答案 0 :(得分:0)

使用以下代码与Scala 2.10和Kafka 0.10以及Spark 1.6.2和Cassandra 3.5。

我正在使用接收器少接近/直接Kafka消费。希望有所帮助

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.SaveMode
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import org.apache.spark.streaming.kafka.KafkaUtils
import com.datastax.spark.connector._

import kafka.serializer.StringDecoder
import org.apache.spark.rdd.RDD
import com.datastax.spark.connector.SomeColumns
import java.util.Formatter.DateTime

object StreamProcessor extends Serializable {
  def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf().setMaster("local[2]").setAppName("StreamProcessor")
      .set("spark.cassandra.connection.host", "127.0.0.1")

    val sc = new SparkContext(sparkConf)

    val ssc = new StreamingContext(sc, Seconds(2))

    val sqlContext = new SQLContext(sc)

    val kafkaParams = Map("metadata.broker.list" -> "localhost:9092")

    val topics = args.toSet

    val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
      ssc, kafkaParams, topics)


        stream
  .map { 
    case (_, msg) => 
      val result = msgParseMaster(msg)
      (result.id, result.data)
   }.foreachRDD(rdd => if (!rdd.isEmpty)     rdd.saveToCassandra("testKS","testTable",SomeColumns("id","data")))

      }
    }

    ssc.start()
    ssc.awaitTermination()

  }

  import org.json4s._
  import org.json4s.native.JsonMethods._
  case class wordCount(id: Long, data1: String, data2: String) extends serializable
  implicit val formats = DefaultFormats
  def msgParseMaster(msg: String): wordCount = {
    val m = parse(msg).extract[wordCount]
    return m

  }

}

答案 1 :(得分:-1)

我认为这可能是一些事情。

  • 您可能没有在项目中正确声明依赖项。你需要确保你有卡夫卡和火花流。如果您使用像maven这样的构建器,可以在此处找到需要添加到构建器文件的行http://mvnrepository.com/
  • 如果您尝试阅读的主题尚不存在,您也会收到错误消息。您可以在命令行中使用类似

    的内容创建它
    bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
    
  • 确保您正在运行kafka服务器和kafka zookeeper。

如果这没有帮助,那么也许你应该发布你的主。