如何交换行的值?

时间:2018-06-15 14:02:39

标签: scala apache-spark apache-spark-sql

val result = df
               .groupBy("col1")
               .agg(min('minTimestamp) as "StartDateUTC", 
                    max('maxTimestamp) as "EndDateUTC") 

对于每个col1,我应该找到最小和最大时间戳。问题是在某些情况下StartDateUTC大于EndDateUTC(请参阅A中的案例df)。在这种情况下,有没有有效的方法来交换这些值?

df =

col1    minTimestamp    maxTimestamp
A       1483264800      1483164800
A       1483200000      1483064800
B       1483300000      1483564800

1 个答案:

答案 0 :(得分:3)

least / greatest

import org.apache.spark.sql.functions._

df.select(
    $"col1", 
    least($"minTimestamp", $"maxTimestamp").alias("minTimestamp"),
    greatest($"minTimestamp", $"maxTimestamp").alias("maxTimestamp")
)

或推入聚合

.agg(
  min(least($"minTimestamp", $"maxTimestamp")) as "StartDateUTC", 
  max(greatest($"minTimestamp", $"maxTimestamp")) as "EndDateUTC"
)