我的输入如下所示。
val inputJson ="""[{"color": "red","value": "#f00"},{"color": "blue","value": "#00f"}]"""
我需要将JSON val转换为ARRAY 我的输出应如下所示。
val colorval=Array("red","blue")
val value=Array("#f00","#00f")
请帮助
答案 0 :(得分:1)
如果您拥有大型数据集,以下解决方案可以为您提供帮助。
//input data I guess you have large data
val inputJson ="""[{"color": "red","value": "#f00"},{"color": "blue","value": "#00f"}]"""
//read the json data to dataframe
val df = sqlContext.read.json(sc.parallelize(inputJson::Nil))
//apply the collecting inbuilt functions
import org.apache.spark.sql.functions.collect_list
df.select(collect_list("color").as("colorVal"), collect_list("value").as("value"))
你应该
+-----------+------------+
|colorVal |value |
+-----------+------------+
|[red, blue]|[#f00, #00f]|
+-----------+------------+
root
|-- colorVal: array (nullable = true)
| |-- element: string (containsNull = true)
|-- value: array (nullable = true)
| |-- element: string (containsNull = true)
答案 1 :(得分:0)
从JSON创建一个DataFrame并将其展开。现在使用collect_list()或collect_set()取决于您是否需要重复项。