Spark数据集相关

时间:2017-07-06 04:59:23

标签: apache-spark apache-spark-sql apache-spark-dataset

通过以下代码

,我的数据集看起来像这样
+------+---------------+----+
|  City|      Timestamp|Sale|
+------+---------------+----+
|City 3|6/30/2017 16:04|  28|
|City 4| 7/4/2017 16:04|  12|
|City 2|7/13/2017 16:04|   8|
|City 4|7/16/2017 16:04|  21|
|City 4| 7/3/2017 16:04|  24|
|City 2|7/17/2017 16:04|  34|
|City 3| 7/9/2017 16:04|  13|
|City 3|7/18/2017 16:04|  26|
|City 3| 7/6/2017 16:04|  16|
|City 3|7/15/2017 16:04|  29|
|City 4|7/18/2017 16:04|  39|
|City 2| 7/1/2017 16:04|  19|
|City 2|7/18/2017 16:04|  19|
|City 4| 7/4/2017 16:04|  24|
|City 2| 7/4/2017 16:04|   9|
|City 4|7/15/2017 16:04|  20|
|City 3|7/12/2017 16:04|  19|
|City 1| 7/9/2017 16:04|  13|
|City 1|7/13/2017 16:04|  25|
|City 4|7/10/2017 16:04|  10|
+------+---------------+----+

我们需要计算每周Sale的{​​{1}}总和。

1 个答案:

答案 0 :(得分:0)

您可以按CityTime stamp分组并汇总Sales

data.groupBy("City", "TimeStamp").agg(sum(col("Sale")).as("TotalSale")).show

希望这有帮助!