Question

使用Spark 2.0时遇到了一个有趣的问题。这是我的情况：

使用sql
使用V1的自联接

select
    a.*,
    b.bcol3
from
(
    select
        col1,
        col2,
        sum(col3) over(partition by
                        col1,
                        col2                        
                        order by col3desc rows unbounded preceding
        )as col3
    from V1
)a
join
(
    select
        col1,
        col2,
        sum(col3) over(partition by
                        market,
                        timegrouptype,
                        periodunit,
                        period
                        order by trxquantity desc  rows unbounded preceding
        )as bcol3
    from V1
)b
on a.col1=b.col1 and a.col2=b.col2

当我从V2获得结果（使用count，select *）时，我得到如下例外：

    org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#73739L])
   +- *Project
      +- *SortMergeJoin [market#69662, timegrouptype#67731, periodunit#67733, period#67732, hcp_system_id#69357], [market#73717, timegrouptype#73491, periodunit#73493, period#73492, hcp_system_id#73608], Inner
         :- *Sort [market#69662 ASC, timegrouptype#67731 ASC, periodunit#67733 ASC, period#67732 ASC, hcp_system_id#69357 ASC], false, 0
         :  +- Exchange hashpartitioning(market#69662, timegrouptype#67731, periodunit#67733, period#67732, hcp_system_id#69357, 200)
         :     +- *HashAggregate(keys=[hcp_system_id#69357, market#69662, timegrouptype#67731, periodunit#67733, period#67732, displayname#67734], functions=[], output=[hcp_system_id#69357, market#69662, timegrouptype#67731, periodunit#67733, period#67732])
         :        +- Exchange hashpartitioning(hcp_system_id#69357, market#69662, timegrouptype#67731, periodunit#67733, period#67732, displayname#67734, 200)
         :           +- *HashAggregate(keys=[hcp_system_id#69357, market#69662, timegrouptype#67731, periodunit#67733, period#67732, displayname#67734], functions=[], output=[hcp_system_id#69357, market#69662, timegrouptype#67731, periodunit#67733, period#67732, displayname#67734])
         :              +- *Project [timegrouptype#67731, periodunit#67733, period#67732, displayname#67734, hcp_system_id#69357, market#69662]
         :                 +- *SortMergeJoin [product_system_id#69514], [product#69666], Inner
         :                    :- *Sort [product_system_id#69514 ASC], false, 0
         :                    :  +- Exchange hashpartitioning(product_system_id#69514, 200)
         :                    :     +- *Project [timegrouptype#67731, periodunit#67733, period#67732, displayname#67734, hcp_system_id#69357, product_system_id#69514]
         :                    :        +- *SortMergeJoin [productgroup_system_id#69829], [productgroup_system_id#69636], Inner
         :                    :           :- *Sort [productgroup_system_id#69829 ASC], false, 0
         :                    :           :  +- Exchange hashpartitioning(productgroup_system_id#69829, 200)
         :                    :           :     +- *Project [timegrouptype#67731, periodunit#67733, period#67732, displayname#67734, hcp_system_id#69357, product_system_id#69514, productgroup_system_id#69829]
         :                    :           :        +- *SortMergeJoin [product_system_id#69514], [product_system_id#69827], Inner
         :                    :           :           :- *Sort [product_system_id#69514 ASC], false, 0
         :                    :           :           :  +- *Project [timegrouptype#67731, periodunit#67733, period#67732, displayname#67734, hcp_system_id#69357, product_system_id#69514]
         :                    :           :           :     +- *SortMergeJoin [product_system_id#69242], [product_system_id#69514], Inner
         :                    :           :           :        :- *Sort [product_system_id#69242 ASC], false, 0
         :                    :           :           :        :  +- Exchange hashpartitioning(product_system_id#69242, 200)

......还有更多。

Spark 1.6.2中的此类查询没有错误。只有更高版本的Spark 2.0才会发生异常。

有没有人遇到过这个问题？你知道它为什么会抛出异常吗？

注意：避免异常的解决方案是缓存V1。或者你用于自我加入的任何表格。

Spark 2.0：自联接临时表的例外情况

0 个答案: