在pandas中,您可以在" inplace"中一次性重命名所有列。方式使用
new_column_name_list =['Pre_'+x for x in df.columns]
df.columns = new_column_name_list
我们可以在Pyspark中执行上述相同步骤而无需最终创建新数据帧吗?这是低效的,因为我们将有2个数据帧具有相同的数据但不同的列名导致不良的内存使用。
以下链接回答了问题,但不在原地。
How to change dataframe column names in pyspark? 编辑 我的问题明显不同于上述链接中的问题
答案 0 :(得分:1)
This is how you could do it in scala spark
Create a map of new column
and old column
name dynamically and select with alias.
val to = df2.columns.map(col(_))
val from = (1 to to.length).map( i => (s"column$i"))
df2.select(to.zip(from).map { case (x, y) => x.alias(y) }: _*).show
Previouse column names
"age", "names"
After changed
"column1". "column2"
However dataframe cannot be updated since they are immutable, But can bes assigned to new one for the further use. Only used dataframes are loaded in memory so this won't be issue.
Hope this helps