我有一个包含3行和20列以上(日期)的数据框
+----+-----+-----+
|Cat |01/02|02/02|......
+----+-----+-----+
| a | 20 | 7 |......
| b | 30 | 12 |......
+----+---+-------+
,我想从每一列中获取总和,并将其作为额外的行添加到数据框中。换句话说,我希望看起来像这样:
+----+-----+-----+
|Cat |01/02|02/02|......
+----+-----+-----+
| a | 20 | 7 |......
| b | 30 | 12 |......
| All| 50 | 19 |......
+----+---+-------+
我正在pySpark中编码,脚本如下:
from pyspark.sql import functions as F
for col_name in fs.columns:
print(col_name)
sf = df.unionAll(
df.select([
F.lit('Total').alias('Cat'),
F.sum(fs.col_name).alias("{}").format(col_name)
])
)
很不幸,我收到错误AttributeError: 'DataFrame' object has no attribute 'col_name'
。有什么想法我做错了吗?预先谢谢你!