通过列表传递Concat数据帧列

时间:2017-09-21 18:15:29

标签: dataframe pyspark

    from pyspark.sql import Row, functions as F
    row = Row("UK_1","UK_2","Date","Cat")
    df = (sc.parallelize
    ([
        row(1,1,'12/10/2016',"A"),
        row(1,2,None,'A'),
        row(2,1,'14/10/2016','B'),
        row(3,3,'!~2016/2/276','B'),
        row(None,1,'26/09/2016','A'),
        row(1,1,'12/10/2016',"A"),
        row(1,2,None,'A'),
        row(2,1,'14/10/2016','B'),
        row(None,None,'!~2016/2/276','B'),
        row(None,1,'26/09/2016','A')
        ]).toDF())

       pks = ["UK_1","UK_2"]

      df1 = (
      df
      .select(columns) 
       #.withColumn('pk',F.concat(pks))
      .withColumn('pk',F.concat("UK_1","UK_2"))
      )

   df1.show()

有没有办法可以将列表列表传入concat?我希望将代码用于可以改变列的场景,并且我希望将其作为列表传递。

1 个答案:

答案 0 :(得分:3)

是的,python中的语法是df.withColumn("pk", F.concat(*pks)).show() +----+----+------------+---+----+ |UK_1|UK_2| Date|Cat| pk| +----+----+------------+---+----+ | 1| 1| 12/10/2016| A| 11| | 1| 2| null| A| 12| | 2| 1| 14/10/2016| B| 21| | 3| 3|!~2016/2/276| B| 33| |null| 1| 26/09/2016| A|null| | 1| 1| 12/10/2016| A| 11| | 1| 2| null| A| 12| | 2| 1| 14/10/2016| B| 21| |null|null|!~2016/2/276| B|null| |null| 1| 26/09/2016| A|null| +----+----+------------+---+----+ (可变数量的参数):

items_to_pay = floor(amount / 3) * 2 + X, with X = amount mod 3