如何使用火花滞后并引导分组和顺序

时间:2018-05-01 08:34:17

标签: apache-spark apache-spark-sql apache-spark-dataset

我用:`

dataset.withColumn("lead",lead(dataset.col(start_date),1).over(orderBy(start_date)));

` 我只是想通过trackId添加group,因此可以通过任何agg函数引导每个组的工作:

+----------+---------------------------------------------+
|  trackId |  start_time    |  end_time   |      lead    |
+-----+--------------------------------------------------+
|  1       | 12:00:00       |   12:04:00  |     12:05:00 |
+----------+---------------------------------------------+
|  1       | 12:05:00       |   12:08:00  |    12:20:00  |  
+----------+---------------------------------------------+
|  1       | 12:20:00       |   12:22:00  |     null     | 
+----------+---------------------------------------------+
|  2       | 13:00:00       |   13:04:00  |    13:05:00 |
+----------+---------------------------------------------+
|  2       | 13:05:00       |   13:08:00  |    13:20:00  |  
+----------+---------------------------------------------+
|  2       | 13:20:00       |   13:22:00  |     null     | 
+----------+---------------------------------------------+

任何帮助怎么做?

2 个答案:

答案 0 :(得分:4)

您缺少的是$('.del-column').on('click', '.del', function() { var index = this.cellIndex + 1; $(this).closest('div').siblings('div.canvas').find('table tr td:nth-child(' + index + ')').remove(); $(this).remove(); }); 关键字和Window方法调用

partitionBy

答案 1 :(得分:2)

您需要使用Window

val df = Seq(
  (1, "12:00:00", "12:04:00"),
  (1, "12:05:00", "12:08:00"),
  (1, "12:20:00", "12:22:00"),
  (2, "13:00:00", "13:04:00"),
  (2, "13:05:00", "13:08:00"),
  (2, "13:20:00", "13:22:00")
).toDF( "trackId","start_time","end_time" )

val window  = Window.partitionBy("trackId").orderBy("start_time")

df.withColumn("lead",lead(col("start_time"),1).over(window))

如果您不想要null,那么您也可以将默认值传递为lead($"start_time",1, defaultValue)

结果:

+-------+----------+--------+--------+
|trackId|start_time|end_time|lead    |
+-------+----------+--------+--------+
|1      |12:00:00  |12:04:00|12:05:00|
|1      |12:05:00  |12:08:00|12:20:00|
|1      |12:20:00  |12:22:00|null    |
|2      |13:00:00  |13:04:00|13:05:00|
|2      |13:05:00  |13:08:00|13:20:00|
|2      |13:20:00  |13:22:00|null    |
+-------+----------+--------+--------+