如何按列对记录进行分区并将序列号分配给另一列

时间:2018-03-21 12:48:34

标签: scala apache-spark

我有输入记录,如下所述

Observable.combineLatest(
            JavaFxObservable.eventsOf(panel1, MouseEvent.DRAG_DETECTED).map(event -> System.currentTimeMillis()).startWith(System.currentTimeMillis()),
            JavaFxObservable.eventsOf(panel1, MouseEvent.MOUSE_PRESSED).map(event -> System.currentTimeMillis()),
            JavaFxObservable.eventsOf(panel1, MouseEvent.MOUSE_RELEASED).map(event -> System.currentTimeMillis()),
            (dragDectedMillis, pressedMillis, releasedMillis) -> releasedMillis > pressedMillis && dragDectedMillis < pressedMillis ? true : false)
    .filter(noDragDetected -> noDragDetected)
    .subscribe(next -> System.out.println("Action"));

我想用列内容创建col3,如下所述。

如果记录不同,col1 col2 abcde vfc abcde vfc abcde vfc abcde dfj abcde dfj abcde wek dghkl tyu dghkl tyu dghkl tyu 中的同一记录可能会有不同的值,那么如果所有记录都相同,我们需要添加col1 1值必须为col3

0

1 个答案:

答案 0 :(得分:0)

import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions._

val df=Seq(("abcde","vfc"),("abcde","vfc"),("abcde","vfc"),("abcde","dfj"),
      ("abcde","dfj"),("abcde","wek"),("dghkl","tyu"),("dghkl","tyu"),("dghkl","tyu")).toDF("col1","col2")
    val WinSpec=Window.partitionBy('col1).orderBy('col2)
    df.withColumn("col3",dense_rank().over(WinSpec))
      .withColumn("col3",when('col3===1,0).otherwise('col3-1)).orderBy('col1).show(false)


+-----+----+----+
|col1 |col2|col3|
+-----+----+----+
|abcde|dfj |0   |
|abcde|dfj |0   |
|abcde|vfc |1   |
|abcde|vfc |1   |
|abcde|vfc |1   |
|abcde|wek |2   |
|dghkl|tyu |0   |
|dghkl|tyu |0   |
|dghkl|tyu |0   |
+-----+----+----+