我有一个火花数据框,我想添加一个具有一些特定值的新列。我尝试使用withcolumn函数,但它没有按预期工作。我想要一个具有特定值的新列,或者我想要替换现有列
答案 0 :(得分:0)
参见此示例
我有一个dataFrame:
>>> df.show()
+-------+----+-----+---+
| name|year|month|day|
+-------+----+-----+---+
| Ali|2014| 9| 1|
| Matei|2015| 10| 26|
|Michael|2015| 10| 25|
|Reynold|2015| 10| 25|
|Patrick|2015| 9| 1|
+-------+----+-----+---+
我想为每一行添加一个信息,我可以使用lit
来执行此操作
>>> from pyspark.sql.functions import lit
>>> df.withColumn('my_new_column', lit('testing info for all')).show()
+-------+----+-----+---+--------------------+
| name|year|month|day| my_new_column|
+-------+----+-----+---+--------------------+
| Ali|2014| 9| 1|testing info for all|
| Matei|2015| 10| 26|testing info for all|
|Michael|2015| 10| 25|testing info for all|
|Reynold|2015| 10| 25|testing info for all|
|Patrick|2015| 9| 1|testing info for all|
+-------+----+-----+---+--------------------+
如果您想为每行添加不同信息的列表,可以使用explode
:
>>> from pyspark.sql.functions import explode
>>> df.withColumn('my_new_column',
... explode(array(lit('testing info for all'),
... lit('other testing again')))).show()
+-------+----+-----+---+--------------------+
| name|year|month|day| my_new_column|
+-------+----+-----+---+--------------------+
| Ali|2014| 9| 1|testing info for all|
| Ali|2014| 9| 1| other testing again|
| Matei|2015| 10| 26|testing info for all|
| Matei|2015| 10| 26| other testing again|
|Michael|2015| 10| 25|testing info for all|
|Michael|2015| 10| 25| other testing again|
|Reynold|2015| 10| 25|testing info for all|
|Reynold|2015| 10| 25| other testing again|
|Patrick|2015| 9| 1|testing info for all|
|Patrick|2015| 9| 1| other testing again|
+-------+----+-----+---+--------------------+