我正在尝试将一个json文件作为SchemaRDD上传到SparkSQL中,进行转换然后查询结果。实际上我需要从以下json中提取“response”元素,这是一个数组,并从中生成一个SchemaRDD:
{
"cursor":{
"prev":null,
"hasNext":false,
"next":"1213061528000000:1:0",
"hasPrev":false,
"total":null,
"id":"1213061528000000:1:0",
"more":false
},
"code":0,
"response":[
{
"name":"disqus_api",
"url":"",
"isFollowing":false,
"isFollowedBy":false,
"profileUrl":"http://disqus.com/disqus_api/",
"avatar":{
"permalink":"http://disqus.com/api/users/avatars/disqus_api.jpg",
"cache":"http://mediacdn.disqus.com/1091/images/noavatar92.png"
},
"id":"1",
"isAnonymous":false,
"email":"example@disqus.com"
},
"media":[],
"isApproved":true,
"dislikes":0,
"raw_message":"\"Happy little bush.\"",
"message":"\"Happy little bush.\"",
"isHighlighted":false,
"ipAddress":"127.0.0.1",
"likes":0
},
.
.
.
遵循SparkSQL文档,这是应该执行此操作的代码:
// path to a file
val jsonFile = "data/comments.json"
//Starting Spark context and SparkSQL context
val sc = new SparkContext("local", "Simple App", "YOUR_SPARK_HOME",
List("target/MyTestApp-1.0-SNAPSHOT.jar"))
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
//Import to try an implicit conversion
import sqlContext._
// Create a SchemaRDD from the file(s) pointed to by path
val json : SchemaRDD = sqlContext.jsonFile(jsonFile)
// Trying to make a SchemaRDD just with the comments listed in the response element
val comments = sqlContext.createSchemaRDD(json.flatMap (row => row(2).asInstanceOf[Seq[org.apache.spark.sql.StructType]))
它无法编译:
SimpleApp.scala:29: error: inferred type arguments [scala.collection.mutable.Seq[org.apache.spark.sql.StructType]] do not conform to method createSchemaRDD's type parameter bounds [A <: Product]
val comments = sqlContext.createSchemaRDD(json.map (extractComments))
尝试隐式转换
val comments = json.flatMap (row => row(2).asInstanceOf[Seq[org.apache.spark.sql.StructType])
不起作用:“comments”是RDD而不是SchemaRDD。
将强制转换更改为其他类型(例如使用SparkSQL ArrayType或Seq [Any])也不起作用。
这有什么问题?