我当时是从JSON文件中提取数据,并且具有以下结构:
DataFrame[CodLic: string, Fecha: struct<$date:struct<$numberLong:string>>, IDBus: struct<$numberInt:string>, NumResults: struct<$numberInt:string>, ResponseTime: struct<$numberDecimal:string>, _id: struct<$oid:string>]
要收取文件费用,我使用以下代码:
df = spark.read.format('json').load(pathText)
这将返回此数据集:
df.show(10)
+-----------+-----------------+-----------+-------------+---------------+--------------------+
| CodLic| Fecha| IDBus| NumResults| ResponseTime| _id|
+-----------+-----------------+-----------+-------------+---------------+--------------------+
| 04P|[[1536761469602]]|[680244294]| [0]| [1404]|[5b991e7de5e8d9c1...|
| 04P|[[1536761469602]]|[680244303]| [0]| [1420]|[5b991e7de5e8d9c1...|
| 04P|[[1536761469602]]|[680244314]| [0]| [1404]|[5b991e7de5e8d9c1...|
| 04P|[[1536761469602]]|[680244316]| [0]| [1388]|[5b991e7de5e8d9c1...|
| 04P|[[1536761469602]]|[680244293]| [0]| [1373]|[5b991e7de5e8d9c1...|
| 04P|[[1536761469618]]|[680244307]| [0]| [1388]|[5b991e7de5e8d9c1...|
| 04P|[[1536761469618]]|[680244272]| [0]| [1404]|[5b991e7de5e8d9c1...|
| 04P|[[1536761469618]]|[680244312]| [0]| [1388]|[5b991e7de5e8d9c1...|
| 04P|[[1536761469618]]|[680244311]| [0]| [1404]|[5b991e7de5e8d9c1...|
| 04P|[[1536761469618]]|[680244317]| [0]| [1388]|[5b991e7de5e8d9c1...|
+-----------+-----------------+-----------+-------------+---------------+--------------------+
only showing top 10 rows
如何将其转换为下一个数据集?:
+-----------+-----------------+-----------+-------------+---------------+--------------------+
| CodLic| Fecha| IDBus| NumResults| ResponseTime| _id|
+-----------+-----------------+-----------+-------------+---------------+--------------------+
| 04P|[[1536761469602]]| 680244294| 0| 1404|[5b991e7de5e8d9c1...|
| 04P|[[1536761469602]]| 680244303| 0| 1420|[5b991e7de5e8d9c1...|
| 04P|[[1536761469602]]| 680244314| 0| 1404|[5b991e7de5e8d9c1...|
| 04P|[[1536761469602]]| 680244316| 0| 1388|[5b991e7de5e8d9c1...|
| 04P|[[1536761469602]]| 680244293| 0| 1373|[5b991e7de5e8d9c1...|
+-----------+-----------------+-----------+-------------+---------------+--------------------+