我有一个问题。我有下表,其中6列作为PI。但是,它主要由LXSTATE_ID访问。问题是LXSTATE_ID有大约800万个重复,我没有看到任何其他列足够独特,可以在PI中使用。该表有大约2.15亿条记录,我在舞台和批量之间做一个MINUS来捕获已更改的记录。它抛出一个假脱机空间问题。可以在这做什么?
SHOW TABLE GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1;
CREATE MULTISET TABLE GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
LXSTATE_ID VARCHAR(4000) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'LXSTATE_ID' NOT NULL,
BUS_OBJ_OID INTEGER TITLE 'BUS_OBJ_OID',
MXSTATEREQ_OID INTEGER TITLE 'MXSTATEREQ_OID',
ACTUAL_DT_GMT TIMESTAMP(0) TITLE 'ACTUAL_DT_GMT',
START_DT_GMT TIMESTAMP(0) TITLE 'START_DT_GMT',
END_DT_GMT TIMESTAMP(0) TITLE 'END_DT_GMT',
DW_LOAD_DATE TIMESTAMP(0) TITLE 'DW_LOAD_DATE',
DW_CREATED_BY VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'DW_CREATED_BY',
DW_UPDATED_DATE TIMESTAMP(0) TITLE 'DW_UPDATED_DATE',
DW_UPDATED_BY VARCHAR(20) CHARACTER SET LATIN NOT CASESPECIFIC TITLE 'DW_UPDATED_BY')
PRIMARY INDEX CDR_ODS_LXSTATE_398850F1_S_PK ( LXSTATE_ID ,BUS_OBJ_OID ,
MXSTATEREQ_OID ,ACTUAL_DT_GMT ,START_DT_GMT ,END_DT_GMT );
这是MINUS查询: 这里VT_LXSTATE_398850F1是一个易失性表,其中捕获了更改的记录。
INSERT INTO VT_LXSTATE_398850F1
(
LXSTATE_ID,
BUS_OBJ_OID,
MXSTATEREQ_OID,
ACTUAL_DT_GMT,
START_DT_GMT,
END_DT_GMT
)
SELECT
LXSTATE_ID,
BUS_OBJ_OID,
MXSTATEREQ_OID,
ACTUAL_DT_GMT,
START_DT_GMT,
END_DT_GMT
FROM GEEDW_PLP_S.CDR_ODS_LXSTATE_398850F1_S
MINUS
SELECT
LXSTATE_ID,
BUS_OBJ_OID,
MXSTATEREQ_OID,
ACTUAL_DT_GMT,
START_DT_GMT,
END_DT_GMT
FROM GEEDW_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1;
以下是插入的解释计划: 以下是INSERT的解释计划。
Explain INSERT INTO VT_LXSTATE_398850F1
(
LXSTATE_ID,
BUS_OBJ_OID,
MXSTATEREQ_OID,
ACTUAL_DT_GMT,
START_DT_GMT,
END_DT_GMT
)
SELECT
LXSTATE_ID,
BUS_OBJ_OID,
MXSTATEREQ_OID,
ACTUAL_DT_GMT,
START_DT_GMT,
END_DT_GMT
FROM GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP
MINUS
SELECT
LXSTATE_ID,
BUS_OBJ_OID,
MXSTATEREQ_OID,
ACTUAL_DT_GMT,
START_DT_GMT,
END_DT_GMT
FROM GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP;
1) First, we lock a distinct GEEDW_D_PLP_S."pseudo table" for read on
a RowHash to prevent global deadlock for
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.
2) Next, we lock a distinct GEEDW_D_PLM_ODS_BULK_T."pseudo table" for
read on a RowHash to prevent global deadlock for
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP.
3) We lock GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP for read, and
we lock GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP for
read.
4) We execute the following steps in parallel.
1) We do an all-AMPs RETRIEVE step from
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP by way of an
all-rows scan with no residual conditions into Spool 2
(all_amps), which is redistributed by the hash code of (
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.END_DT_GMT,
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.START_DT_GMT,
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.ACTUAL_DT_GMT,
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.MXSTATEREQ_OID,
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.BUS_OBJ_OID,
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.LXSTATE_ID) to
all AMPs. Then we do a SORT to order Spool 2 by row hash and
the sort key in spool field1 eliminating duplicate rows. The
input table will not be cached in memory, but it is eligible
for synchronized scanning. The result spool file will not be
cached in memory. The size of Spool 2 is estimated with no
confidence to be 322,724,040 rows (1,755,618,777,600 bytes).
The estimated time for this step is 1 hour and 55 minutes.
2) We do an all-AMPs RETRIEVE step from
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP by way of
an all-rows scan with no residual conditions into Spool 3
(all_amps), which is redistributed by the hash code of (
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP.END_DT_GMT,
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP.START_DT_GMT,
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP.ACTUAL_DT_GMT,
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP.MXSTATEREQ_OID,
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP.BUS_OBJ_OID,
GEEDW_D_PLM_ODS_BULK_T.CDR_ODS_LXSTATE_398850F1_BKP.LXSTATE_ID)
to all AMPs. Then we do a SORT to order Spool 3 by row hash
and the sort key in spool field1 eliminating duplicate rows.
The input table will not be cached in memory, but it is
eligible for synchronized scanning. The result spool file
will not be cached in memory. The size of Spool 3 is
estimated with no confidence to be 161,362,020 rows (
877,809,388,800 bytes). The estimated time for this step is
56 minutes and 33 seconds.
5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
all-rows scan, which is joined to Spool 3 (Last Use) by way of an
all-rows scan. Spool 2 and Spool 3 are joined using an exclusion
merge join, with a join condition of ("Field_1 = Field_1"). The
result goes into Spool 1 (all_amps), which is built locally on the
AMPs. The size of Spool 1 is estimated with no confidence to be
242,043,030 rows (1,316,714,083,200 bytes). The estimated time
for this step is 9 minutes and 11 seconds.
6) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by way of
an all-rows scan into Spool 4 (all_amps), which is redistributed
by the hash code of (
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.LXSTATE_ID,
GEEDW_D_PLP_S.CDR_ODS_LXSTATE_398850F1_S_BKP.MXSTATEREQ_OID) to
all AMPs. Then we do a SORT to order Spool 4 by row hash. The
result spool file will not be cached in memory. The size of Spool
4 is estimated with no confidence to be 242,043,030 rows (
331,114,865,040 bytes). The estimated time for this step is 59
minutes and 11 seconds.
7) We do an all-AMPs MERGE into "502332938".VT_LXSTATE_398850F1 from
Spool 4 (Last Use). The size is estimated with no confidence to
be 242,043,030 rows. **The estimated time for this step is 19 hours
and 53 minutes.**
8) We spoil the parser's dictionary cache for the table.
9) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> No rows are returned to the user as the result of statement 1.