将行列表转换为pyspark数据框

时间:2019-08-19 15:29:36

标签: python pyspark rows

我有以下要转换为pyspark df的行的列表:

data= [Row(id=u'1', probability=0.0, thresh=10, prob_opt=0.45),
 Row(id=u'2', probability=0.4444444444444444, thresh=60, prob_opt=0.45),
 Row(id=u'3', probability=0.0, thresh=10, prob_opt=0.45),
 Row(id=u'80000000808', probability=0.0, thresh=100, prob_opt=0.45)]

我需要将其转换为pyspark DF

我尝试过执行data.toDF(),但不起作用。

3 个答案:

答案 0 :(得分:0)

找到答案!

rdd = sc.parallelize(data)

df=spark.createDataFrame(rdd, ['id', 'probability','thresh','prob_opt'])

答案 1 :(得分:0)

您可以尝试以下代码:

from pyspark.sql import Row

rdd = sc.parallelize(data)

df=rdd.toDF()

答案 2 :(得分:0)

这似乎有效:

spark.createDataFrame(data)

测试结果:

from pyspark.sql import SparkSession, Row

spark = SparkSession.builder.getOrCreate()

data = [Row(id=u'1', probability=0.0, thresh=10, prob_opt=0.45),
        Row(id=u'2', probability=0.4444444444444444, thresh=60, prob_opt=0.45),
        Row(id=u'3', probability=0.0, thresh=10, prob_opt=0.45),
        Row(id=u'80000000808', probability=0.0, thresh=100, prob_opt=0.45)]

df = spark.createDataFrame(data)
df.show()
#  +-----------+------------------+------+--------+
#  |         id|       probability|thresh|prob_opt|
#  +-----------+------------------+------+--------+
#  |          1|               0.0|    10|    0.45|
#  |          2|0.4444444444444444|    60|    0.45|
#  |          3|               0.0|    10|    0.45|
#  |80000000808|               0.0|   100|    0.45|
#  +-----------+------------------+------+--------+