我有以下要转换为pyspark df的行的列表:
data= [Row(id=u'1', probability=0.0, thresh=10, prob_opt=0.45),
Row(id=u'2', probability=0.4444444444444444, thresh=60, prob_opt=0.45),
Row(id=u'3', probability=0.0, thresh=10, prob_opt=0.45),
Row(id=u'80000000808', probability=0.0, thresh=100, prob_opt=0.45)]
我需要将其转换为pyspark DF
我尝试过执行data.toDF(),但不起作用。
答案 0 :(得分:0)
找到答案!
rdd = sc.parallelize(data)
df=spark.createDataFrame(rdd, ['id', 'probability','thresh','prob_opt'])
答案 1 :(得分:0)
您可以尝试以下代码:
from pyspark.sql import Row
rdd = sc.parallelize(data)
df=rdd.toDF()
答案 2 :(得分:0)
这似乎有效:
spark.createDataFrame(data)
测试结果:
from pyspark.sql import SparkSession, Row
spark = SparkSession.builder.getOrCreate()
data = [Row(id=u'1', probability=0.0, thresh=10, prob_opt=0.45),
Row(id=u'2', probability=0.4444444444444444, thresh=60, prob_opt=0.45),
Row(id=u'3', probability=0.0, thresh=10, prob_opt=0.45),
Row(id=u'80000000808', probability=0.0, thresh=100, prob_opt=0.45)]
df = spark.createDataFrame(data)
df.show()
# +-----------+------------------+------+--------+
# | id| probability|thresh|prob_opt|
# +-----------+------------------+------+--------+
# | 1| 0.0| 10| 0.45|
# | 2|0.4444444444444444| 60| 0.45|
# | 3| 0.0| 10| 0.45|
# |80000000808| 0.0| 100| 0.45|
# +-----------+------------------+------+--------+