我一直在使用Scala进行Spark的ETL工作,在该ETL中,我想添加3个参数分别定义repartionBy
,partitionBy
,orderBy
以将数据帧写入存储。但是,这些参数必须是可选的。
我真的不想写一个可怕的if...else
语句来接受这8种可能性的任何组合。
我有一个功能:
def writer(
outputFormat: String,
outputFile: String,
outputMode: SaveMode,
outputRepartionBy: String,
outputParitionBy: String,
outputOrderBy: String,
dryRun: Boolean = false
)(df: DataFrame): Unit = {
if (dryRun){
df.show(500, false)
}else{
if (outputFormat == "parquet" || outputFormat == "orc" ) {
df.write.format(outputFormat).mode(outputMode).save( outputFile )
} else {
df.write.format(outputFormat).save(outputFile)
}
}
}
是否可以做类似的事情:
df.write
.if( outputRepartionBy != null ){ repartitionby( outputRepartionBy ) }
.format( outputFormat )
.mode(outputMode)
.save( outputFile )
如果满足条件,而在scala / spark中不存在这种可能性,是否是链接函数的正确方法?
编辑:我在使用Scala 2.11.12的Spark 2.3.1
答案 0 :(得分:2)
您可以做类似的事情
val temp=df.write.format(outputformat)
val writer =if ( outputRepartionBy != null ) temp.repartitionby(outputRepartitionBy) else temp
writer.mode(outputMode).save(outputFile)
答案 1 :(得分:2)
我已经使用this blog post来实现我想要的逻辑,它看起来更好,而且非常简洁。
sealed class ConditionalApplicative[T] private (val value: T) { // if condition class wrapper
class ElseApplicative(value: T, elseCondition: Boolean) extends ConditionalApplicative[T](value) {
// else condition class wrapper extends ConditionalApplicative to avoid double wrapping
// in case: $if(condition1) { .. }. $if(condition2) { .. }
def $else(f: T => T): T = if(elseCondition) f(value) else value
}
// if method for external scope condition
def $if(condition: => Boolean)(f: T => T): ElseApplicative =
if(condition) new ElseApplicative(f(value), false)
else new ElseApplicative(value, true)
// if method for internal scope condition
def $if(condition: T => Boolean) (f: T => T): ElseApplicative =
if(condition(value)) new ElseApplicative(f(value), false)
else new ElseApplicative(value, true)
}
object ConditionalApplicative { // Companion object for using ConditionalApplicative[T] generic
implicit def lift2ConditionalApplicative[T](any: T): ConditionalApplicative[T] =
new ConditionalApplicative(any)
implicit def else2T[T](els: ConditionalApplicative[T]#ElseApplicative): T =
els.value
}
通过在我的方法中导入此代码,我可以执行以下操作:
def writer(outputFormat: String, outputFile: String, outputMode: SaveMode, outputRepartionBy: String,outputParitionBy :String, outputOrderBy :String, dryRun: Boolean = false)(df: DataFrame): Unit = {
import etl.tool.ConditionalApplicative._
if (dryRun){
df.show(500, false)
}else{
if (outputFormat == "parquet" | outputFormat == "orc" ) {
df
.$if(outputOrderBy != null){
_.orderBy(col(outputOrderBy))
}.$if(outputRepartionBy != null){
_.repartition(col(outputRepartionBy))
}.write.format(outputFormat).mode(outputMode)
.$if(outputParitionBy != null){
_.partitionBy(outputParitionBy)
}.save( outputFile )
} else {
df.write.format(outputFormat).save(outputFile)
}
}
}
生成的代码说明了一切。尽管我对基本逻辑的理解是有限的。