我下面有一个RDD,是我从客户那里收到的。如何将该RDD转换为数据框?
["Row(Moid=2, Tripid='11', Tstart='2007-05-28 08:53:14.040', Tend='2007-05-28 08:53:16.040', Xstart='9738.73', Ystart='103.246', Xend='9743.73', Yend='114.553')"]
答案 0 :(得分:0)
注意:这并不是一个真正的答案,但我不了解OP在问什么。在评论部分中编写此代码是不可能的,但也许我们可以从此处进行。
OP表示他/她从客户那里收到了RDD(据称是单个元素)-
["Row(Moid=2, Tripid='11', Tstart='2007-05-28 08:53:14.040', Tend='2007-05-28 08:53:16.040', Xstart='9738.73', Ystart='103.246', Xend='9743.73', Yend='114.553')"]
现在,OP希望将其转换为DataFrame。要对此进行翻译,必须对Row
对象进行解串,但OP必须阐明他的需求。
from pyspark.sql import Row
rdd_from_client = [Row(Moid=2, Tripid='11', Tstart='2007-05-28 08:53:14.040', Tend='2007-05-28 08:53:16.040', Xstart='9738.73', Ystart='103.246', Xend='9743.73', Yend='114.553')]
df = sqlContext.createDataFrame(rdd_from_client)
df.show(truncate=False)
+----+-----------------------+------+-----------------------+-------+-------+-------+-------+
|Moid|Tend |Tripid|Tstart |Xend |Xstart |Yend |Ystart |
+----+-----------------------+------+-----------------------+-------+-------+-------+-------+
|2 |2007-05-28 08:53:16.040|11 |2007-05-28 08:53:14.040|9743.73|9738.73|114.553|103.246|
+----+-----------------------+------+-----------------------+-------+-------+-------+-------+