我在pyspark脚本行下面,
df_output = df.select("*",$checkcol)
df_output.show()
通过对变量进行硬编码可以很好地工作, 但是参数化后会抛出错误,
pyspark.sql.utils.AnalysisException:'无法解析\'`**,F .....
其中checkcol
是一个变量,其值如下所示,
checkcol-
F.when(F.col("colA")=='null',"Yes").otherwise(date_validation_udf("colA")).alias("colA_DateCheck"),
F.when(F.col("colB")=='null',"Yes").otherwise(date_validation_udf("colB")).alias("colB_DateCheck"),F.when(F.col("colC")=='null',"Yes").otherwise(date_validation_udf("colC")).alias("colC_DateCheck"),
F.when(F.col("colD")=='null',"Yes").otherwise(num_check_udf("colD")).alias("colD_NumCheck"),F.when(F.col("colE")=='null',"Yes").otherwise(num_check_udf("colE")).alias("colE_NumCheck"),
F.when(F.col("colF")=='null',"Yes").otherwise(num_check_udf("colF")).alias("colF_NumCheck"),F.when(F.col("colG")=='null',"Yes").otherwise(num_check_udf("colG")).alias("colG_NumCheck")
答案 0 :(得分:0)
尝试一下:
import pyspark.sql.functions as F
df_output = df.withColumn("colA",
F.when(F.col("colA")=='null',"Yes").otherwise(date_validation_udf("colA")).alias("colA_DateCheck"))
.withColumn("colB",
F.when(F.col("colB")=='null',"Yes").otherwise(date_validation_udf("colB")).alias("colB_DateCheck"),F.when(F.col("colC")=='null',"Yes").otherwise(date_validation_udf("colC")).alias("colC_DateCheck"),)
...
df_output.show()
编辑:
要将这些语句作为一个变量传递,请尝试以下操作:
checkcol = (F.when(F.col("colA") == 'null', "Yes").otherwise(date_validation_udf("colA")).alias("colA_DateCheck"),
F.when(F.col("colB") == 'null', "Yes").otherwise(date_validation_udf("colB")).alias("colB_DateCheck"),
F.when(F.col("colC") == 'null', "Yes").otherwise(date_validation_udf("colC")).alias("colC_DateCheck"),
F.when(F.col("colD") == 'null', "Yes").otherwise(num_check_udf("colD")).alias("colD_NumCheck"),
F.when(F.col("colE") == 'null', "Yes").otherwise(num_check_udf("colE")).alias("colE_NumCheck"),
F.when(F.col("colF") == 'null', "Yes").otherwise(num_check_udf("colF")).alias("colF_NumCheck"),
F.when(F.col("colG") == 'null', "Yes").otherwise(num_check_udf("colG")).alias("colG_NumCheck"))
df_output = df.select(
'*',
*checkcol
)