Question

我们有一个流数据，我在HBase表中有一些主信息。对于每一行，我需要查找HBase主表并获取一些配置文件信息。我的代码是这样的

val con             = new setContext(hadoopHome,sparkMaster)
val l_sparkcontext  = con.getSparkContext
val l_hivecontext   = con.getHiveContext

val topicname       = "events"
val ssc             = new StreamingContext(l_sparkcontext, Seconds(30))
val eventsStream = KafkaUtils.createStream(ssc,"xxx.xxx.142.xxx:2181","receive_rest_events",Map(topicname.toString -> 10))
println("Kafka Stream for receiving Events.." )

val profile_data = l_hivecontext.sql("select gender, income, age, riid from hbase_customer_profile")
profile_data.foreach(println)
val tabBC = l_sparkcontext.broadcast(profile_data)

eventsStream.foreachRDD(rdd => {
    rdd.foreach(record => {
    val subs_profile_rows = tabBC.value
    val Rows = record._2.split(rowDelim)
    Rows.foreach(row => {
      val values = row.split(colDelim)
      val riid = values(1).toInt
      val cond = "riid = " + riid
      println("Condition : ", cond)
      val enriched_events = subs_profile_rows.filter(cond)
    }) // End of Rows
  }) // End of RDD
}) // End of Events Stream

不幸的是我总是在过滤器上点击NPE。我在这里跟踪了几个问题和答案，以便在工作节点之间广播值，但没有任何帮助。有人可以帮忙吗。

问候

巴拉

Answer 1

你的上下文使用看起来有点可疑......对我来说，看起来你正在创建两个单独的上下文（一个spark，一个用于spark-streaming），然后尝试在这些上下文之间共享一个广播变量（不会工作）。

我们有一些类似的代码。以下是显示我们如何在Splice Machine（开源）中进行操作的视频，以备您感兴趣。我会尝试找到代码或让其他人为您发布。

http://community.splicemachine.com/splice-machine-tutorial-video-configuring-kafka-feed-splice-machine-part/

http://community.splicemachine.com/splice-machine-tutorial-video-configuring-kafka-feed-splice-machine-ii/

祝你好运。

foreach中的Spark Streaming过滤条件 - NullPointerException

1 个答案: