在编译时,SparkSQL中从RDD到SchemaRDD的隐式或显式转换失败

时间:2014-12-10 14:57:29

标签: json scala apache-spark

我正在尝试将一个json文件作为SchemaRDD上传到SparkSQL中,进行转换然后查询结果。实际上我需要从以下json中提取“response”元素,这是一个数组,并从中生成一个SchemaRDD:

{
"cursor":{
"prev":null,
"hasNext":false,
"next":"1213061528000000:1:0",
"hasPrev":false,
"total":null,
"id":"1213061528000000:1:0",
"more":false
},
 "code":0,
 "response":[
    {

      "name":"disqus_api",
      "url":"",
      "isFollowing":false,
      "isFollowedBy":false,
      "profileUrl":"http://disqus.com/disqus_api/",
      "avatar":{
        "permalink":"http://disqus.com/api/users/avatars/disqus_api.jpg",
        "cache":"http://mediacdn.disqus.com/1091/images/noavatar92.png"
      },
      "id":"1",
      "isAnonymous":false,
      "email":"example@disqus.com"
    },
    "media":[],
    "isApproved":true,
    "dislikes":0,
    "raw_message":"\"Happy little bush.\"",
    "message":"\"Happy little bush.\"",
    "isHighlighted":false,
    "ipAddress":"127.0.0.1",
    "likes":0
  },
  .
  .
  .

遵循SparkSQL文档,这是应该执行此操作的代码:

// path to a file
val jsonFile = "data/comments.json" 

//Starting Spark context and SparkSQL context
val sc = new SparkContext("local", "Simple App", "YOUR_SPARK_HOME",
  List("target/MyTestApp-1.0-SNAPSHOT.jar"))
val sqlContext = new org.apache.spark.sql.SQLContext(sc)

//Import to try an implicit conversion
import sqlContext._

// Create a SchemaRDD from the file(s) pointed to by path
val json : SchemaRDD  = sqlContext.jsonFile(jsonFile)

// Trying to make a SchemaRDD just with the comments listed in the response element
val comments = sqlContext.createSchemaRDD(json.flatMap (row => row(2).asInstanceOf[Seq[org.apache.spark.sql.StructType]))

它无法编译:

SimpleApp.scala:29: error: inferred type arguments   [scala.collection.mutable.Seq[org.apache.spark.sql.StructType]] do not conform to method createSchemaRDD's type parameter bounds [A <: Product]
val comments = sqlContext.createSchemaRDD(json.map (extractComments))

尝试隐式转换

val comments = json.flatMap (row => row(2).asInstanceOf[Seq[org.apache.spark.sql.StructType])

不起作用:“comments”是RDD而不是SchemaRDD。

将强制转换更改为其他类型(例如使用SparkSQL ArrayType或Seq [Any])也不起作用。

这有什么问题?

0 个答案:

没有答案