SparkSQL - 如何重用以前选择的值

时间:2017-03-20 15:08:43

标签: scala apache-spark apache-spark-sql

我需要第一个UDF(GetOtherTriggers)的值作为第二个UDF(GetTriggerType)的参数。

以下代码无效:

val df = sql.sql(
  "select GetOtherTriggers(categories) as other_triggers, GetTriggerType(other_triggers) from my_table")

返回以下异常: org.apache.spark.sql.AnalysisException:无法解析给定输入列的'other_triggers':[my_table columns];

1 个答案:

答案 0 :(得分:2)

您可以使用子查询:

val df = sql.sql("""select GetTriggerType(other_triggers), other_triggers 
                 from (
                      select GetOtherTriggers(categories) as other_triggers, *
                      from my_table
                      ) withOther """)

测试:

val df = sc.parallelize (1 to 10).map(x => (x, x*2, x*3)).toDF("nr1", "nr2", "nr3");
df.createOrReplaceTempView("nr");
spark.udf.register("x3UDF", (x: Integer) => x*3);
spark.sql("""select x3UDF(nr1x3), nr1x3, nr3 
             from (
                   select x3UDF(nr1) as nr1x3, * 
                   from nr
                  ) a """)
     .show()

给出:

+----------+-----+---+
|UDF(nr1x3)|nr1x3|nr3|
+----------+-----+---+
|         9|    3|  4|
|        18|    6|  8|
|        27|    9| 12|
|        36|   12| 16|
|        45|   15| 20|
|        54|   18| 24|
|        63|   21| 28|
|        72|   24| 32|
|        81|   27| 36|
|        90|   30| 40|
+----------+-----+---+