Teradata聚合计算本地与全局

时间:2013-12-28 03:31:38

标签: teradata

我有两个表Table1和Table2,两个表的主索引为col1,col2,col3和col4。 我加入这两个表并在一组包含表的PI的列上进行分组。 有人能告诉我为什么在解释计划中我得到“全局计算总体中间结果” 而不是在当地。我的理解是,按列分组时包含所有PI列 聚合结果是在本地而不是全局计算的。

select
A.col1
,A.col2
,A.col3
,A.col4
,col5
,col6
,col7
,col8
,col9
,SUM(col10)
,COUNT(col11)
table1 A
left outer join
table2 B
on A.col1 = B.col1
A.col2 = B.col2
A.col3 = B.col3
A.col4 = B.col4
group by A.col1,A.col2,A.col3,A.col4,col5,col6,col7,col8,col9

以下是查询的解释计划

        1) First, we lock a distinct DATEBASE_NAME."pseudo table" for read on a
        RowHash to prevent global deadlock for DATEBASE_NAME.S. 
        2) Next, we lock a distinct DATEBASE_NAME."pseudo table" for write on a
        RowHash to prevent global deadlock for
        DATEBASE_NAME.TARGET_TABLE. 
        3) We lock a distinct DATEBASE_NAME."pseudo table" for read on a RowHash
        to prevent global deadlock for DATEBASE_NAME.E. 
        4) We lock DATEBASE_NAME.S for read, we lock
        DATEBASE_NAME.TARGET_TABLE for write, and we lock
        DATEBASE_NAME.E for read. 
        5) We do an all-AMPs JOIN step from DATEBASE_NAME.S by way of a RowHash
        match scan with no residual conditions, which is joined to
        DATEBASE_NAME.E by way of a RowHash match scan.  DATEBASE_NAME.S and
        DATEBASE_NAME.E are left outer joined using a merge join, with
        condition(s) used for non-matching on left table ("(NOT
        (DATEBASE_NAME.S.col1 IS NULL )) AND ((NOT
        (DATEBASE_NAME.S.col2 IS NULL )) AND ((NOT
        (DATEBASE_NAME.S.col3 IS NULL )) AND (NOT
        (DATEBASE_NAME.S.col4 IS NULL ))))"), with a join condition of (
        "(DATEBASE_NAME.S.col1 = DATEBASE_NAME.E.col1) AND
        ((DATEBASE_NAME.S.col2 = DATEBASE_NAME.E.col2) AND
        ((DATEBASE_NAME.S.col3 = DATEBASE_NAME.E.col3) AND
        (DATEBASE_NAME.S.col4 = DATEBASE_NAME.E.col4 )))").  The input
        table DATEBASE_NAME.S will not be cached in memory.  The result goes
        into Spool 3 (all_amps), which is built locally on the AMPs.  The
        result spool file will not be cached in memory.  The size of Spool
        3 is estimated with low confidence to be 675,301,664 rows (
        812,387,901,792 bytes).  The estimated time for this step is 3
        minutes and 37 seconds. 
        6) We do an all-AMPs SUM step to aggregate from Spool 3 (Last Use) by
        way of an all-rows scan , grouping by field1 (
        DATEBASE_NAME.S.col1 ,DATEBASE_NAME.S.col2
        ,DATEBASE_NAME.S.col3 ,DATEBASE_NAME.S.col4
        ,DATEBASE_NAME.E.col5
        ,DATEBASE_NAME.S.col6 ,DATEBASE_NAME.S.col7
        ,DATEBASE_NAME.S.col8 ,DATEBASE_NAME.S.col9).  Aggregate
        Intermediate Results are computed globally, then placed in Spool 4. 
        The aggregate spool file will not be cached in memory.  The size
        of Spool 4 is estimated with low confidence to be 506,476,248 rows
        (1,787,354,679,192 bytes).  The estimated time for this step is 1
        hour and 1 minute. 
        7) We do an all-AMPs MERGE into DATEBASE_NAME.TARGET_TABLE
        from Spool 4 (Last Use).  The size is estimated with low
        confidence to be 506,476,248 rows.  The estimated time for this
        step is 33 hours and 12 minutes. 
        8) We spoil the parser's dictionary cache for the table. 
        9) Finally, we send out an END TRANSACTION step to all AMPs involved
        in processing the request.
        -> No rows are returned to the user as the result of statement 1.

2 个答案:

答案 0 :(得分:0)

你只需使用col1,col2,col3,col4进行聚合 然后它会聚合locall?

此网址的更多详情: http://www.teradataforum.com/teradata/20040526_133730.htm

答案 1 :(得分:0)

我认为这是因为中间线轴。您正在使用该假脱机中的列,而不是原始表中的列。我能够使用易失性表在本地计算聚合中间结果。

基本上,在这种情况下发生的事情是我从第5步获取了假脱机,为其命名并在其上强制执行PI。由于volatile表的PI与初始表相同,因此volatile表生成也是本地放大器操作。

CREATE VOLATILE TABLE x AS
(
SELECT
A.col1
,A.col2
,A.col3
,A.col4
,col5
,col6
,col7
,col8
,col9
--,SUM(col10)
--,COUNT(col11)
from
table1 A
left outer join
table2 B
on A.col1 = B.col1
A.col2 = B.col2
A.col3 = B.col3
A.col4 = B.col4
--group by A.col1,A.col2,A.col3,A.col4,col5,col6,col7,col8,col9
)
WITH DATA PRIMARY INDEX (col1, col2, col3, col4)
;

SELECT
col1
,col2
,col3
,col4
,col5
,col6
,col7
,col8
,col9
SUM(col10)
COUNT(col11)
from
x
GROUP BY 
col1,col2,col3,col4,col5,col6,col7,col8,col9