如何使用Pyspark将RDD转换为Dataframe?

时间:2019-02-12 11:18:51

标签: python apache-spark-sql pyspark-sql

我下面有一个RDD,是我从客户那里收到的。如何将该RDD转换为数据框?

["Row(Moid=2, Tripid='11', Tstart='2007-05-28 08:53:14.040', Tend='2007-05-28 08:53:16.040', Xstart='9738.73', Ystart='103.246', Xend='9743.73', Yend='114.553')"]

1 个答案:

答案 0 :(得分:0)

注意:这并不是一个真正的答案,但我不了解OP在问什么。在评论部分中编写此代码是不可能的,但也许我们可以从此处进行。

OP表示他/她从客户那里收到了RDD(据称是单个元素)-

["Row(Moid=2, Tripid='11', Tstart='2007-05-28 08:53:14.040', Tend='2007-05-28 08:53:16.040', Xstart='9738.73', Ystart='103.246', Xend='9743.73', Yend='114.553')"]

现在,OP希望将其转换为DataFrame。要对此进行翻译,必须对Row对象进行解串,但OP必须阐明他的需求。

from pyspark.sql import Row
rdd_from_client = [Row(Moid=2, Tripid='11', Tstart='2007-05-28 08:53:14.040', Tend='2007-05-28 08:53:16.040', Xstart='9738.73', Ystart='103.246', Xend='9743.73', Yend='114.553')]
df = sqlContext.createDataFrame(rdd_from_client)
df.show(truncate=False)
+----+-----------------------+------+-----------------------+-------+-------+-------+-------+
|Moid|Tend                   |Tripid|Tstart                 |Xend   |Xstart |Yend   |Ystart |
+----+-----------------------+------+-----------------------+-------+-------+-------+-------+
|2   |2007-05-28 08:53:16.040|11    |2007-05-28 08:53:14.040|9743.73|9738.73|114.553|103.246|
+----+-----------------------+------+-----------------------+-------+-------+-------+-------+