在Spark中分解数组的JSON数组

时间:2018-11-14 11:57:24

标签: json apache-spark apache-spark-sql

我有一个JSON文档,其条目如下:

{  
   "data":[  
      [  
         1,
         "CD07C40E-4943-44B0-BF5E-370DA2133E25",
         1,
         1320919663,
         "386118",
         1320919663,
         "386118",
         "{\n  \"invalidCells\" : {\n    \"1669635\" : \" \"\n  }\n}",
         null,
         " --T::00",
         null,
         null,
         null,
         [  
            null,
            null,
            null,
            null,
            null
         ],
         null
      ],
      [  
         2,
         "152ECD05-2301-43C7-88C5-085199623DA7",
         2,
         1320919663,
         "386118",
         1320919663,
         "386118",
         "{\n}",
         "6900 37th Av S",
         "Medic Response",
         1320881580,
         "47.540683",
         "-122.286131",
         [  
            null,
            "47.540683",
            "-122.286131",
            null,
            false
         ],
         "F110104166"
      ],
      [  
         3,
         "311ED596-51B7-4E70-A293-12378C61F0A6",
         3,
         1320919663,
         "386118",
         1320919663,
         "386118",
         "{\n}",
         "N 50th St / Stone Way N",
         "Aid Response",
         1320881520,
         "47.665034",
         "-122.340207",
         [  
            null,
            "47.665034",
            "-122.340207",
            null,
            false
         ],
         "F110104164"
      ]
   ]
}

我已经有JSON文档中先前结构的列,我将不在这里列出。我现在想要的只是从上面的“数据”数组中获取值,然后填充表格。

我尝试了以下方法,但似乎不起作用:

import org.apache.spark.sql.functions._
seattlefire.withColumn("data", explode(seattlefire("data")).as("data_flattened")).show(false)

0 个答案:

没有答案