免责声明:我是 Scala 的新手。
功能如下:
def getSuggestedTests (df: DataFrame) : DataFrame = {
// We ask deequ to compute constraint suggestions for us on the data
val suggestionResult = { ConstraintSuggestionRunner()
// data to suggest constraints for
.onData(df)
// default set of rules for constraint suggestion
.addConstraintRules(Rules.DEFAULT)
// run data profiling and constraint suggestion
.run()
}
// We can now investigate the constraints that Deequ suggested.
val suggestionDataFrame = suggestionResult.constraintSuggestions.flatMap {
case (column, suggestions) =>
suggestions.map { constraint =>
(column, constraint.description, constraint.codeForConstraint)
}
}.toSeq.toDF()
return suggestionDataFrame
}
根据我目前的理解,UDF 应该这样创建:
val getSuggestedTestsUdf = udf(getSuggestedTests(_: DataFrame))
我收到以下错误:
An error was encountered:
java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] is not supported