将函数转换为 UDF 时遇到问题

时间:2021-04-06 21:32:44

标签: scala apache-spark user-defined-functions

免责声明:我是 Scala 的新手。

功能如下:

def getSuggestedTests (df: DataFrame) : DataFrame = {
    // We ask deequ to compute constraint suggestions for us on the data
    val suggestionResult = { ConstraintSuggestionRunner()
      // data to suggest constraints for
      .onData(df)
      // default set of rules for constraint suggestion
      .addConstraintRules(Rules.DEFAULT)
      // run data profiling and constraint suggestion
      .run()
    }

    // We can now investigate the constraints that Deequ suggested. 
    val suggestionDataFrame = suggestionResult.constraintSuggestions.flatMap { 
      case (column, suggestions) => 
        suggestions.map { constraint =>
          (column, constraint.description, constraint.codeForConstraint)
        } 
    }.toSeq.toDF()
    
    return suggestionDataFrame
}

根据我目前的理解,UDF 应该这样创建:

val getSuggestedTestsUdf = udf(getSuggestedTests(_: DataFrame))

我收到以下错误:

An error was encountered:
java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] is not supported

0 个答案:

没有答案