在阅读了一些关于Whole State Code Generation
的文章后,spark会对字节码进行优化,以将查询计划转换为优化的执行计划。
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-sql-whole-stage-codegen.html
现在我的下一个问题仍然是在完成与字节码相关的所有优化之后,将这些字节码指令转换为机器代码指令可能仍然是一个可能的瓶颈,因为这是由JIT在运行时单独完成的要进行此优化,JIT必须有足够的运行。
因此,spark执行与优化字节码(这是whole stage code gen
的结果)到机器代码的动态/运行时转换相关的任何事情,或者它依赖于JIT将这些字节代码指令转换为机器代码指令。因为如果它依赖于JIT,则涉及某些不确定性。
答案 0 :(得分:3)
spark does bytecode optimizations to convert a query plan to an optimized execution plan.
Spark SQL does not do bytecode optimizations.
Spark SQL simply uses CollapseCodegenStages physical preparation rule and eventually converts a query plan into a single-method Java source code (that Janino compiles and generates the bytecode).
So does spark do anything related to dynamic/runtime conversion of optimized bytecode
No.
Speaking of JIT, <table class="splitTable">
<tr>
<td class="sides">
<div class="leftSide">
<span class="chooseText">Choose</span>
<table class="SSRSSObjectCostTableTest">
<tr>
<td class="sideForSSRSSTables">Say this is 1st element</td>
<td class="sideForSSRSSTables">Say this is 2nd element</td>
</tr>
</table>
</div>
</td>
<td class="sides">
<div class="rightSide">
<span class="partsText">Parts</span>
<button type="button" class="addButton">+Add Part</button>
<!--<table class="outerPartTable">-->
<table class="partsTable">
<td class="sideForPartsTable" width="5%">Expand button</td>
<td class="sideForPartsTable">Title + sum1 + sum2</td>
<td class="sideForPartsTable" width="5%">edit</td>
<td class="sideForPartsTable" width="5%">remove</td>
</table>
<!--</table>-->
</div>
</td>
</tr>
</table>
does this check whether the whole-stage codegen generates "too long generated codes" or not that could be above spark.sql.codegen.hugeMethodLimit Spark SQL internal property (that is 8000 by default and is the value of HugeMethodLimit in the OpenJDK JVM settings).
The maximum bytecode size of a single compiled Java function generated by whole-stage codegen. When the compiled function exceeds this threshold, the whole-stage codegen is deactivated for this subtree of the current query plan. The default value is 8000 and this is a limit in the OpenJDK JVM implementation.
There are not that many physical operators that support CodegenSupport so reviewing their WholeStageCodegenExec
and doConsume
methods should reveal whether if at all JIT might not kick in.