我有一些json数据,其数组可以包含零个或多个元素。下面是数据。当我爆炸数组时,零元素的行将被删除。在这种情况下名称:安迪即将被放弃。
>>> d1 = [{"name":"Michael", "schools":[{"sname":"stanford", "year":2010}, {"sname":"berkeley", "year":2012}]},{"name":"Andy","schools":[]}]
>>> df1= sqlContext.createDataFrame(d1)
>>> df2 = df1.withColumn('school_details', func.explode(df1.schools))
>>> df3 = df2.select(df2.name, df2.school_details.sname,df2.school_details.year)
>>> df3.show()
+-------+---------------------+--------------------+
| name|school_details[sname]|school_details[year]|
+-------+---------------------+--------------------+
|Michael| stanford| 2010|
|Michael| berkeley| 2012|
+-------+---------------------+--------------------+
如何获得如下所有记录。
预期结果
+-------+---------------------+--------------------+
| name|school_details[sname]|school_details[year]|
+-------+---------------------+--------------------+
|Michael| stanford| 2010|
|Michael| berkeley| 2012|
|Andy | null | null|
+-------+---------------------+--------------------+