Spark Streaming - 广播变量 - 案例类

时间:2016-08-05 13:49:19

标签: scala hbase spark-streaming

我的要求是使用HBase表中的配置文件信息来丰富数据流数据。我当时想要使用广播变量。在这里附上整个代码。

HBase数据的输出如下

在Driver节点HBaseReaderBuilder

(org.apache.spark.SparkContext@3c58b102,hbase_customer_profile,Some(data),WrappedArray(gender, age),None,None,List()))

在Worker节点

HBaseReaderBuilder(null,hbase_customer_profile,Some(data),WrappedArray(gender, age),None,None,List()))

正如你所看到的,它已经失去了火花环境。当我发出声明val

myRdd = bcdocRdd.map(r => Profile(r._1, r._2, r._3)) i get a NullPointerException

java.lang.NullPointerException
        at it.nerdammer.spark.hbase.HBaseReaderBuilderConversions$class.toSimpleHBaseRDD(HBaseReaderBuilder.scala:83)
        at it.nerdammer.spark.hbase.package$.toSimpleHBaseRDD(package.scala:5)
        at it.nerdammer.spark.hbase.HBaseReaderBuilderConversions$class.toHBaseRDD(HBaseReaderBuilder.scala:67)
        at it.nerdammer.spark.hbase.package$.toHBaseRDD(package.scala:5)
        at testPartition$$anonfun$main$1$$anonfun$apply$1$$anonfun$apply$2.apply(testPartition.scala:34)
        at testPartition$$anonfun$main$1$$anonfun$apply$1$$anonfun$apply$2.apply(testPartition.scala:33)



object testPartition {


def main(args: Array[String]) : Unit = {

val sparkMaster     = "spark://x.x.x.x:7077"
val ipaddress       = "x.x.x.x:2181" // Zookeeper
val hadoopHome      = "/home/hadoop/software/hadoop-2.6.0"
val topicname       = "new_events_test_topic"

val mainConf = new SparkConf().setMaster(sparkMaster).setAppName("testingPartition")

val mainSparkContext = new SparkContext(mainConf)

val ssc             = new StreamingContext(mainSparkContext, Seconds(30))
val eventsStream    = KafkaUtils.createStream(ssc,"x.x.x.x:2181","receive_rest_events",Map(topicname.toString -> 2))
val docRdd           = mainSparkContext.hbaseTable[(String, Option[String], Option[String])]("hbase_customer_profile").select("gender","age").inColumnFamily("data")
println ("docRDD from Driver ",docRdd)
val broadcastedprof     = mainSparkContext.broadcast(docRdd)

eventsStream.foreachRDD(dstream => {
  dstream.foreachPartition(records => {
    println("Broadcasted docRDD - in Worker ", broadcastedprof.value)
    val bcdocRdd  = broadcastedprof.value
    records.foreach(record => {
      //val myRdd = bcdocRdd.map(r => Profile(r._1, r._2, r._3))
      //myRdd.foreach(println)
      val Rows = record._2.split("\r\n")
    })
  })
})
ssc.start()
ssc.awaitTermination()

}   }

0 个答案:

没有答案