PySpark中的DataFrame转换

时间:2018-09-17 14:22:33

标签: pyspark

我当时是从JSON文件中提取数据,并且具有以下结构:

DataFrame[CodLic: string, Fecha: struct<$date:struct<$numberLong:string>>, IDBus: struct<$numberInt:string>, NumResults: struct<$numberInt:string>, ResponseTime: struct<$numberDecimal:string>, _id: struct<$oid:string>]

要收取文件费用,我使用以下代码:

df = spark.read.format('json').load(pathText)

这将返回此数据集:

df.show(10)

+-----------+-----------------+-----------+-------------+---------------+--------------------+
|     CodLic|            Fecha|      IDBus|   NumResults|   ResponseTime|                 _id|
+-----------+-----------------+-----------+-------------+---------------+--------------------+
|        04P|[[1536761469602]]|[680244294]|          [0]|         [1404]|[5b991e7de5e8d9c1...|
|        04P|[[1536761469602]]|[680244303]|          [0]|         [1420]|[5b991e7de5e8d9c1...|
|        04P|[[1536761469602]]|[680244314]|          [0]|         [1404]|[5b991e7de5e8d9c1...|
|        04P|[[1536761469602]]|[680244316]|          [0]|         [1388]|[5b991e7de5e8d9c1...|
|        04P|[[1536761469602]]|[680244293]|          [0]|         [1373]|[5b991e7de5e8d9c1...|
|        04P|[[1536761469618]]|[680244307]|          [0]|         [1388]|[5b991e7de5e8d9c1...|
|        04P|[[1536761469618]]|[680244272]|          [0]|         [1404]|[5b991e7de5e8d9c1...|
|        04P|[[1536761469618]]|[680244312]|          [0]|         [1388]|[5b991e7de5e8d9c1...|
|        04P|[[1536761469618]]|[680244311]|          [0]|         [1404]|[5b991e7de5e8d9c1...|
|        04P|[[1536761469618]]|[680244317]|          [0]|         [1388]|[5b991e7de5e8d9c1...|
+-----------+-----------------+-----------+-------------+---------------+--------------------+
only showing top 10 rows

如何将其转换为下一个数据集?:

+-----------+-----------------+-----------+-------------+---------------+--------------------+
|     CodLic|            Fecha|      IDBus|   NumResults|   ResponseTime|                 _id|
+-----------+-----------------+-----------+-------------+---------------+--------------------+
|        04P|[[1536761469602]]|  680244294|            0|           1404|[5b991e7de5e8d9c1...|
|        04P|[[1536761469602]]|  680244303|            0|           1420|[5b991e7de5e8d9c1...|
|        04P|[[1536761469602]]|  680244314|            0|           1404|[5b991e7de5e8d9c1...|
|        04P|[[1536761469602]]|  680244316|            0|           1388|[5b991e7de5e8d9c1...|
|        04P|[[1536761469602]]|  680244293|            0|           1373|[5b991e7de5e8d9c1...|
+-----------+-----------------+-----------+-------------+---------------+--------------------+

0 个答案:

没有答案