PySpark:如何将地图列表分为列和行?

时间:2019-09-13 16:50:22

标签: pyspark

我有一张包含以下内容的地图列表:

fields = [{"trials": 1.0, "name": "Alice", "score": 8.0}, {"trials": 2.0, "name": "Bob", "score": 10.0"}]

地图列表从API调用作为JSON Blob返回。当我将其转换为PySpark中的数据框时,将得到以下内容:

+-------------------------------------------+---------+
|fields                                     |key      |
+-------------------------------------------+---------+
|[1.0, Alice, 8.0]                          |key1     |
|[2.0, Bob, 10.0]                           |key2     |
|[1.0, Charlie, 8.0]                        |key3     |
|[2.0, Sue, 10.0]                           |key4     |
|[1.0, Clark, 8.0]                          |key5     |
|[3.0, Sarah, 10.0]                         |key6     |

我想把它变成这种形式:

+-------------------------------------------+---------+
|trials| name | score                       |key      |
+-------------------------------------------+---------+
|1.0   |Alice  | 8.0                        |key1     |
|2.0   | Bob   | 10.0                       |key2     |
|1.0   |Charlie| 8.0                        |key3     |
|2.0   |Sue    | 10.0                       |key4     |
|1.0   |Clark  | 8.0                        |key5     |
|3.0   |Sarah  | 10.0                       |key6     |

解决此问题的最佳方法是什么?这是我到目前为止的内容:

from pyspark import SparkConf
from pyspark import SparkContext
from pyspark.sql import SQLContext
conf = SparkConf()
sc = SparkContext(conf=conf)

sqlContext = SQLContext(sc)
rdd = sc.parallelize(results)
df = sqlContext.read.json(rdd)
df.show()

0 个答案:

没有答案