如何将directstream从kafka转换为spark 1.3.0中的数据帧

时间:2015-08-14 05:13:31

标签: apache-spark hive streaming apache-kafka

创建如下的直接流后:

val events = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
  ssc, kafkaParams, topicsSet)

我想将上面的流转换为数据帧,以便我可以对它运行配置单元查询。有谁能解释一下如何实现这一目标?我正在使用spark版本1.3.0

1 个答案:

答案 0 :(得分:1)

正如Spark Streaming programming guide中所述,试试这个:

import org.apache.spark.sql.SQLContext
object SQLContextSingleton {
  @transient private var instance: SQLContext = null

  // Instantiate SQLContext on demand
  def getInstance(sparkContext: SparkContext): SQLContext = synchronized {
    if (instance == null) {
      instance = new SQLContext(sparkContext)
    }
    instance
  }
}
case class Row(key: String, value: String)
eventss.foreachRDD { rdd =>
  val sqlContext = SQLContextSingleton.getInstance(rdd.sparkContext)
  import sqlContext.implicits._
  val dataFrame = rdd.map {case (key, value) => Row(key, value)}.toDF()
  dataFrame.show()
}