答案 0 :(得分:0)
要创建Pyspark数据框,可以使用函数createDataFrame()
matrix =([[11,12,13,14,15],[21,22,23,24,25],[31,32,33,34,35],[41,42,43,44 ,45])
df = spark.createDataFrame(matrix)
df.show()
+ --- + --- + --- + --- + --- +
| _1 | _2 | _3 | _4 | _5 |
+ --- + --- + --- + --- + --- +
| 11 | 12 | 13 | 14 | 15 |
| 21 | 22 | 23 | 24 | 25 |
| 31 | 32 | 33 | 34 | 35 |
| 41 | 42 | 43 | 44 | 45 |
+ --- + --- + --- + --- + --- +
如上所示,这些列将自动用数字命名。 您还可以将自己的列名传递给createDataFrame()函数:
columns = ['mycol _'+ str(col)for col in range(5)]
df = spark.createDataFrame(matrix,schema = columns)
df.show()
+ ------- + ------- + ------- + ------- + ------- +
| mycol_0 | mycol_1 | mycol_2 | mycol_3 | mycol_4 |
+ ------- + ------- + ------- + ------- + ------- +
| 11 | 12 | 13 | 14 | 15 |
| 21 | 22 | 23 | 24 | 25 |
| 31 | 32 | 33 | 34 | 35 |
| 41 | 42 | 43 | 44 | 45 |
+ ------- + ------- + ------- + ------- + ------- +