ConstantFolding不起作用是Spark SQL Catalyst Optimizer

时间:2017-12-23 21:26:00

标签: scala apache-spark apache-spark-sql

ConstantFolding是Catalyst中的运算符优化规则,它替换可以使用等效文字值静态计算的表达式。此对象是基本优化器类中的运算符优化批处理中的逻辑计划优化规则。

也许我错过了一些东西,但是......为什么这段代码不会优化呢?

val xyzDataFrame = Seq((1, 3, 4)).toDF("x", "y", "z")
xyzDataFrame.selectExpr("(((1 + 3 + 4 + x) + (2 + y) + 3) + x) + z").explain(true) 

结果是

17/12/23 19:14:32 INFO SparkSqlParser: Parsing command: (((1 + 3 + 4 + x) + (2 + y) + 3) + x) + z
== Parsed Logical Plan ==
'Project [(((((((1 + 3) + 4) + 'x) + (2 + 'y)) + 3) + 'x) + 'z) AS (((((((1 + 3) + 4) + x) + (2 + y)) + 3) + x) + z)#317]
+- Project [_1#306 AS x#310, _2#307 AS y#311, _3#308 AS z#312]
   +- LocalRelation [_1#306, _2#307, _3#308]

== Analyzed Logical Plan ==
(((((((1 + 3) + 4) + x) + (2 + y)) + 3) + x) + z): int
Project [(((((((1 + 3) + 4) + x#310) + (2 + y#311)) + 3) + x#310) + z#312) AS (((((((1 + 3) + 4) + x) + (2 + y)) + 3) + x) + z)#317]
+- Project [_1#306 AS x#310, _2#307 AS y#311, _3#308 AS z#312]
   +- LocalRelation [_1#306, _2#307, _3#308]

== Optimized Logical Plan ==
LocalRelation [(((((((1 + 3) + 4) + x) + (2 + y)) + 3) + x) + z)#317]

== Physical Plan ==
LocalTableScan [(((((((1 + 3) + 4) + x) + (2 + y)) + 3) + x) + z)#317]

我期待这样的事情:

== Optimized Logical Plan ==
LocalRelation [(13 + (2 * x)  + y + z)#317]
编辑

使用数据集的示例:

case class XYZ(id: Int, x:Int, y:Int, z:Int)
val xyz_objectDataset = Seq(XYZ(1, 2, 5, 3)).toDS
xyz_objectDataset.select(lit(3) + lit(2)).explain(true)
. . . 
== Optimized Logical Plan ==
LocalRelation [(3 + 2)#528]

不要优化,但是这样做了

val java_lang_longDataset= spark.range(1)
java_lang_longDataset.select(lit(3) + lit(2)).explain(true)
...
== Optimized Logical Plan ==
Project [5 AS (3 + 2)#532]

有什么想法吗?

0 个答案:

没有答案