Spark 2.0:自联接临时表的例外情况

时间:2017-06-03 00:08:30

标签: apache-spark apache-spark-sql self-join apache-spark-2.0

使用Spark 2.0时遇到了一个有趣的问题。这是我的情况:

  1. 使用sql
  2. 创建临时视图V1
  3. 使用V1的自联接
  4. 创建临时视图V2
    select
        a.*,
        b.bcol3
    from
    (
        select
            col1,
            col2,
            sum(col3) over(partition by
                            col1,
                            col2                        
                            order by col3desc rows unbounded preceding
            )as col3
        from V1
    )a
    join
    (
        select
            col1,
            col2,
            sum(col3) over(partition by
                            market,
                            timegrouptype,
                            periodunit,
                            period
                            order by trxquantity desc  rows unbounded preceding
            )as bcol3
        from V1
    )b
    on a.col1=b.col1 and a.col2=b.col2

    当我从V2获得结果(使用count,select *)时,我得到如下例外:

        org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
    Exchange SinglePartition
    +- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#73739L])
       +- *Project
          +- *SortMergeJoin [market#69662, timegrouptype#67731, periodunit#67733, period#67732, hcp_system_id#69357], [market#73717, timegrouptype#73491, periodunit#73493, period#73492, hcp_system_id#73608], Inner
             :- *Sort [market#69662 ASC, timegrouptype#67731 ASC, periodunit#67733 ASC, period#67732 ASC, hcp_system_id#69357 ASC], false, 0
             :  +- Exchange hashpartitioning(market#69662, timegrouptype#67731, periodunit#67733, period#67732, hcp_system_id#69357, 200)
             :     +- *HashAggregate(keys=[hcp_system_id#69357, market#69662, timegrouptype#67731, periodunit#67733, period#67732, displayname#67734], functions=[], output=[hcp_system_id#69357, market#69662, timegrouptype#67731, periodunit#67733, period#67732])
             :        +- Exchange hashpartitioning(hcp_system_id#69357, market#69662, timegrouptype#67731, periodunit#67733, period#67732, displayname#67734, 200)
             :           +- *HashAggregate(keys=[hcp_system_id#69357, market#69662, timegrouptype#67731, periodunit#67733, period#67732, displayname#67734], functions=[], output=[hcp_system_id#69357, market#69662, timegrouptype#67731, periodunit#67733, period#67732, displayname#67734])
             :              +- *Project [timegrouptype#67731, periodunit#67733, period#67732, displayname#67734, hcp_system_id#69357, market#69662]
             :                 +- *SortMergeJoin [product_system_id#69514], [product#69666], Inner
             :                    :- *Sort [product_system_id#69514 ASC], false, 0
             :                    :  +- Exchange hashpartitioning(product_system_id#69514, 200)
             :                    :     +- *Project [timegrouptype#67731, periodunit#67733, period#67732, displayname#67734, hcp_system_id#69357, product_system_id#69514]
             :                    :        +- *SortMergeJoin [productgroup_system_id#69829], [productgroup_system_id#69636], Inner
             :                    :           :- *Sort [productgroup_system_id#69829 ASC], false, 0
             :                    :           :  +- Exchange hashpartitioning(productgroup_system_id#69829, 200)
             :                    :           :     +- *Project [timegrouptype#67731, periodunit#67733, period#67732, displayname#67734, hcp_system_id#69357, product_system_id#69514, productgroup_system_id#69829]
             :                    :           :        +- *SortMergeJoin [product_system_id#69514], [product_system_id#69827], Inner
             :                    :           :           :- *Sort [product_system_id#69514 ASC], false, 0
             :                    :           :           :  +- *Project [timegrouptype#67731, periodunit#67733, period#67732, displayname#67734, hcp_system_id#69357, product_system_id#69514]
             :                    :           :           :     +- *SortMergeJoin [product_system_id#69242], [product_system_id#69514], Inner
             :                    :           :           :        :- *Sort [product_system_id#69242 ASC], false, 0
             :                    :           :           :        :  +- Exchange hashpartitioning(product_system_id#69242, 200)
    

    ......还有更多。

    Spark 1.6.2中的此类查询没有错误。只有更高版本的Spark 2.0才会发生异常。

    有没有人遇到过这个问题?你知道它为什么会抛出异常吗?

    注意:避免异常的解决方案是缓存V1。或者你用于自我加入的任何表格。

0 个答案:

没有答案