Question

我正在将json文档读入数据框。但是，其格式复杂。我能够使用爆炸功能来获取值。

root
 |-- Name: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- Adap: string (nullable = true)
 |    |    |-- Vid: string (nullable = true)
 |-- Information: struct (nullable = true)
 |    |-- Caption: string (nullable = true)
 |    |-- No: string (nullable = true)
 |-- License: struct (nullable = true)
 |    |-- Out: struct (nullable = true)
 |    |    |-- ID: string (nullable = true)
 |    |-- In: struct (nullable = true)
 |    |    |-- INS: string (nullable = true)

尽管Json很大，但我不想手动编写所有内容。我的方式适用于所有价值观：

mdmDF.withColumn("Name", explode("Name")).select(col("Name")["Adap"].alias("Name.Adap")))

第二次，我尝试：尽管每列只给我一个数据帧。！

Name = mdmDF.selectExpr（“ explode（Name）AS Name”）。selectExpr（“ Name。*”）

+------------------+----------+
|    Name          |      Adap|      
+------------------+----------+
|NVIDIA            |         0|
+------------------+----------+

我想要什么：

+------------------+----------+----------+----------+----------+----------+
|    adap          |      vid |  Caption |     no   |     Out  |    In    |
+------------------+----------+----------+----------+----------+----------+
|NVIDIA            |         0|      test|      1   |     etx  |    val   |
+------------------+----------+----------+----------+----------+----------+

使用pyspark读取复杂的json模式

0 个答案: