获取pyspark数据帧中每一列的总和

时间:2019-03-05 20:53:14

标签: dataframe pyspark pyspark-sql union-all

我有一个包含3行和20列以上(日期)的数据框

+----+-----+-----+         
|Cat |01/02|02/02|......
+----+-----+-----+
| a  | 20  |   7 |......
| b  | 30  |  12 |......
+----+---+-------+

,我想从每一列中获取总和,并将其作为额外的行添加到数据框中。换句话说,我希望看起来像这样:

+----+-----+-----+
|Cat |01/02|02/02|......
+----+-----+-----+
| a  | 20  |   7 |......
| b  | 30  |  12 |......
| All| 50  |  19 |......
+----+---+-------+

我正在pySpark中编码,脚本如下:

from pyspark.sql import functions as F
    for col_name in fs.columns:
      print(col_name)

      sf = df.unionAll(
      df.select([
         F.lit('Total').alias('Cat'),
         F.sum(fs.col_name).alias("{}").format(col_name)
         ])
       )

很不幸,我收到错误AttributeError: 'DataFrame' object has no attribute 'col_name'。有什么想法我做错了吗?预先谢谢你!

0 个答案:

没有答案