Question

在其他情况下，我必须添加或点亮不止一列。当我放置多于一列时，我会出错。

是否有其他方法可以替代或替代我尝试做的事情？

这就是我现在正在做的事情。

import org.apache.spark.sql.expressions._
    val windowSpec = Window.partitionBy("FundamentalSeriesId", "FundamentalSeriesPeriodEndDate", "FundamentalSeriesPeriodType")
    val windowSpec2 = Window.partitionBy("FundamentalSeriesId", "FundamentalSeriesPeriodEndDate", "FundamentalSeriesPeriodType", "group").orderBy(unix_timestamp($"TimeStamp", "yyyy-MM-dd'T'HH:mm:ss").cast("timestamp").desc)

    def containsUdf = udf { (array: Seq[String]) => array.contains("null") || array.contains("NULL") || array.contains(null) }


    val latestForEachKey1 = tempReorder
      .withColumn("group", when(containsUdf(collect_list("FundamentalSeriesStatementTypeCode").over(windowSpec)), lit("same")).otherwise($"FundamentalSeriesStatementTypeCode"))
      .withColumn("rank", row_number().over(windowSpec2))
      .filter($"rank" === 1).drop("rank", "group")

但是当我在otherwsie部分中添加多于一列时，会出现错误。下面的代码使我出错。

val latestForEachKey1 = tempReorder
      .withColumn("group", when(containsUdf(collect_list("FundamentalSeriesStatementTypeCode").over(windowSpec)), lit("same")).otherwise($"FundamentalSeriesStatementTypeCode",$"FundamentalSeriesStatementPeriodId"))
      .withColumn("rank", row_number().over(windowSpec2))
      .filter($"rank" === 1).drop("rank", "group")

如何在Scala Spark数据框中以其他方式放置多于一列

0 个答案: