创建如下的直接流后:
val events = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](
ssc, kafkaParams, topicsSet)
我想将上面的流转换为数据帧,以便我可以对它运行配置单元查询。有谁能解释一下如何实现这一目标?我正在使用spark版本1.3.0
答案 0 :(得分:1)
正如Spark Streaming programming guide中所述,试试这个:
import org.apache.spark.sql.SQLContext
object SQLContextSingleton {
@transient private var instance: SQLContext = null
// Instantiate SQLContext on demand
def getInstance(sparkContext: SparkContext): SQLContext = synchronized {
if (instance == null) {
instance = new SQLContext(sparkContext)
}
instance
}
}
case class Row(key: String, value: String)
eventss.foreachRDD { rdd =>
val sqlContext = SQLContextSingleton.getInstance(rdd.sparkContext)
import sqlContext.implicits._
val dataFrame = rdd.map {case (key, value) => Row(key, value)}.toDF()
dataFrame.show()
}