在spark中获取嵌套的json对象

时间:2016-01-27 04:49:33

标签: apache-spark

下面是我的json printschema,想要打印id> gt的所有记录。 1268431,id在这里是字符串。

|-- results: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- description: string (nullable = true)
|    |    |-- id: string (nullable = true)
|-- total: long (nullable = true)
|-- result: string (nullable = true)

我试图使用以下命令:

 val teenagers = sqlContext.sql("select results.id from people where results.id > 1268431 ").collect().foreach(println);

请帮忙。

1 个答案:

答案 0 :(得分:0)

假设以下JSON结构: -

{"results": 
   [{"name":"x","id":"1"}, 
    {"name":"y","id":"2"}], 
 "total":"20", "result":30
}

以下是使用Spark-SQL在JSON嵌套结构上查询和应用过滤器的方法: -

val df = sqlContext.jsonFile("<file PATH>")
df.registerTempTable("people")
df.printSchema()

//Here is the SQL Query which Filters the results on the values provided in Nested Schema using LATERAL VIEWS and explode function.
sqlContext.sql("select expResults.id,expResults.name from people LATERAL VIEW explode(results)people AS expResults where int(expResults.id) > 1").collect().foreach(println)

上述代码的输出将是控制台上打印的以下值: -

[2,y]