How to add a column to the existing DataFrame and using window function to add specific rows in the new column using Scala/Spark 2.2

时间:2017-11-08 22:09:57

标签: scala apache-spark apache-spark-sql

Eg: I would like to add the quantity sold by the date.

Date       Quantity
11/4/2017    20 
11/4/2017    23 
11/4/2017    12 
11/5/2017    18
11/5/2017    12

Output with the new Column:

Date        Quantity, New_Column
11/4/2017      20        55
11/4/2017      23        55
11/4/2017      12        55
11/5/2017      18        30
11/5/2017      12        30

1 个答案:

答案 0 :(得分:3)

通过指定 WindowSpec ,只需使用sum作为窗口函数:

import org.apache.spark.sql.expressions.Window

df.withColumn("New_Column", sum("Quantity").over(Window.partitionBy("Date"))).show
+---------+--------+----------+
|     Date|Quantity|New_Column|
+---------+--------+----------+
|11/5/2017|      18|        30|
|11/5/2017|      12|        30|
|11/4/2017|      20|        55|
|11/4/2017|      23|        55|
|11/4/2017|      12|        55|
+---------+--------+----------+