Question

我有32.000列，一些视图包含多达百万行，可能更多。来自teradata forum的@ulrich提供了几乎不错的解决方案。主要目标是创建volatile表，然后通过动态sql将所有必需的信息粘贴到其中。这是一个完整的修改后的解决方案：

.run file = /yourpath/logon.txt ;

.set width 500;

.OS rm /yourpath/view_col_type_sql.txt;

.export report file=/yourpath/view_col_type_sql.txt

    select 'insert into view_column_data_type Select distinct''' !! Trim(databasename) !! ''','''!!Trim(tablename) !! ''','''!!Trim(columnname)!!''',type('!!trim(databasename)!! '.' 
!! trim(tablename)!! '.' !! trim(columnname) !!');'(title '')
    from dbc.columns 
    where (databasename, tablename) in (select databasename, tablename from dbc.tables where tablekind = 'V')
    ;

.export reset;

create volatile table view_column_data_type
( 
  databasename varchar(30),
  tablename varchar(30),
  columnname varchar(30),
  columntype varchar(30)
) primary index (databasename, tablename)
on commit preserve rows;

.run file /yourpath/view_col_type_sql.txt;

select *
from view_column_data_type
order by 1,2,3
;

.logoff;

但是我不能使用那个解决方案，我遇到了线轴问题。问题是查询： select type(databasename.tableName.columName)返回列的类型n次，其中n是行数。使用distinct或group by 1（同样的方式，因为TD14可以自己选择）。

在TD v.14.1中4年后有什么变化吗？

UPD1

explain insert into view_column_data_type Select distinct'db1','tb1','col1',type(db1.tb1.col1);

  1) First, we lock db1.o in view tb1 for access,
     we lock db1.a in view tb1 for access, we
     lock db1.o in view tb1 for access, and we
     lock db1.a in view tb1 for access.
  2) Next, we execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from db1.o in view
          tb1 by way of an all-rows scan with no residual
          conditions into Spool 11 (all_amps), which is redistributed
          by the hash code of (db1.o.GUID) to all AMPs.  The
          size of Spool 11 is estimated with low confidence to be
          74,480 rows (66,659,600 bytes).  The estimated time for this
          step is 0.13 seconds.
       2) We do an all-AMPs RETRIEVE step from db1.a in view
          tb1 by way of an all-rows scan with no residual
          conditions into Spool 12 (all_amps), which is redistributed
          by the hash code of (db1.a.GUID) to all AMPs.  The
          size of Spool 12 is estimated with low confidence to be 280
          rows (256,200 bytes).  The estimated time for this step is
          0.13 seconds.
  3) We do an all-AMPs JOIN step from Spool 11 (Last Use) by way of an
     all-rows scan, which is joined to Spool 12 (Last Use) by way of an
     all-rows scan.  Spool 11 and Spool 12 are full outer joined using
     a single partition hash join, with condition(s) used for
     non-matching on right table ("NOT (GUID IS NULL)"), with a join
     condition of ("GUID = GUID").  The result goes into Spool 10
     (all_amps), which is built locally on the AMPs.  The size of Spool
     10 is estimated with low confidence to be 74,759 rows (
     134,491,441 bytes).  The estimated time for this step is 0.84
     seconds.
  4) We do an all-AMPs STAT FUNCTION step from Spool 10 (Last Use) by
     way of an all-rows scan into Spool 17 (Last Use), which is assumed
     to be redistributed by value to all AMPs.  The result rows are put
     into Spool 15 (all_amps), which is built locally on the AMPs.  The
     size is estimated with low confidence to be 74,759 rows (
     72,890,025 bytes).
  5) We do an all-AMPs STAT FUNCTION step from Spool 15 (Last Use) by
     way of an all-rows scan into Spool 20 (Last Use), which is
     redistributed by hash code to all AMPs.  The result rows are put
     into Spool 19 (all_amps), which is built locally on the AMPs.  The
     size is estimated with low confidence to be 74,759 rows (
     71,693,881 bytes).
  6) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from Spool 19 (Last Use) by
          way of an all-rows scan with a condition of ("(Field_20 <>
          'D') OR (Field_21 = 1)") into Spool 9 (used to materialize
          view, derived table, table function or table operator t3)
          (all_amps), which is built locally on the AMPs.  The size of
          Spool 9 is estimated with low confidence to be 74,759 rows (
          69,600,629 bytes).  The estimated time for this step is 4.66
          seconds.
       2) We do an all-AMPs RETRIEVE step from db1.o in view
          tb1 by way of an all-rows scan with no residual
          conditions into Spool 24 (all_amps), which is redistributed
          by the hash code of (db1.o.MK) to all AMPs.  Then
          we do a SORT to order Spool 24 by row hash.  The size of
          Spool 24 is estimated with low confidence to be 280 rows (
          116,200 bytes).
  7) We do an all-AMPs RETRIEVE step from Spool 24 by way of an
     all-rows scan into Spool 25 (all_amps), which is duplicated on all
     AMPs.  The size of Spool 25 is estimated with low confidence to be
     78,400 rows (32,536,000 bytes).  The estimated time for this step
     is 0.02 seconds.
  8) We do an all-AMPs JOIN step from db1.a in view
     tb1 by way of an all-rows scan with no residual
     conditions, which is joined to Spool 25 (Last Use) by way of an
     all-rows scan.  db1.a and Spool 25 are left outer
     joined using a product join, with condition(s) used for
     non-matching on left table ("NOT (db1.a.GUID IS NULL)"),
     with a join condition of ("GUID = db1.a.GUID").  The
     result goes into Spool 26 (all_amps), which is redistributed by
     the hash code of (db1.o.MK) to all AMPs.  Then we do a
     SORT to order Spool 26 by row hash.  The size of Spool 26 is
     estimated with low confidence to be 559 rows (245,401 bytes).
  9) We do an all-AMPs JOIN step from Spool 26 (Last Use) by way of a
     RowHash match scan, which is joined to Spool 24 (Last Use) by way
     of a RowHash match scan.  Spool 26 and Spool 24 are full outer
     joined using a merge join, with a join condition of ("Field_1 =
     Field_1").  The result goes into Spool 23 (all_amps), which is
     built locally on the AMPs.  The size of Spool 23 is estimated with
     low confidence to be 559 rows (463,411 bytes).  The estimated time
     for this step is 0.03 seconds.
10) We do an all-AMPs STAT FUNCTION step from Spool 23 (Last Use) by
     way of an all-rows scan into Spool 31 (Last Use), which is
     redistributed by hash code to all AMPs.  The result rows are put
     into Spool 29 (all_amps), which is built locally on the AMPs.  The
     size is estimated with low confidence to be 559 rows (273,910
     bytes).
11) We do an all-AMPs STAT FUNCTION step from Spool 29 (Last Use) by
     way of an all-rows scan into Spool 34 (Last Use), which is
     redistributed by hash code to all AMPs.  The result rows are put
     into Spool 33 (all_amps), which is built locally on the AMPs.  The
     size is estimated with low confidence to be 559 rows (264,966
     bytes).
12) We execute the following steps in parallel.
      1) We do an all-AMPs RETRIEVE step from Spool 33 (Last Use) by
         way of an all-rows scan with a condition of ("(Field_12 <>
         'D') OR (Field_13 = 1)") into Spool 8 (used to materialize
         view, derived table, table function or table operator t2)
         (all_amps), which is built locally on the AMPs.  The size of
         Spool 8 is estimated with low confidence to be 559 rows (
         249,314 bytes).  The estimated time for this step is 0.01
         seconds.
      2) We do an all-AMPs RETRIEVE step from db1.o in view
         tb1 by way of an all-rows scan with no residual
         conditions locking for access into Spool 51 (all_amps), which
         is redistributed by the hash code of (db1.o.GUID)
         to all AMPs.  Then we do a SORT to order Spool 51 by row hash.
         The size of Spool 51 is estimated with low confidence to be
         74,480 rows (1,564,080 bytes).  The estimated time for this
         step is 0.06 seconds.
      3) We do an all-AMPs RETRIEVE step from db1.a in view
         tb1 by way of an all-rows scan with no residual
         conditions locking for access into Spool 52 (all_amps), which
         is redistributed by the hash code of (db1.a.GUID)
         to all AMPs.  Then we do a SORT to order Spool 52 by row hash.
         The size of Spool 52 is estimated with low confidence to be
         280 rows (9,240 bytes).  The estimated time for this step is
         0.06 seconds.
13) We do an all-AMPs JOIN step from Spool 51 (Last Use) by way of a
     RowHash match scan, which is joined to Spool 52 (Last Use) by way
     of a RowHash match scan.  Spool 51 and Spool 52 are full outer
     joined using a merge join, with condition(s) used for non-matching
     on right table ("NOT (GUID IS NULL)"), with a join condition of (
     "GUID = GUID").  The result goes into Spool 50 (all_amps), which
     is built locally on the AMPs.  The size of Spool 50 is estimated
     with low confidence to be 74,759 rows (3,214,637 bytes).  The
     estimated time for this step is 0.07 seconds.
14) We do an all-AMPs STAT FUNCTION step from Spool 50 (Last Use) by
     way of an all-rows scan into Spool 57 (Last Use), which is assumed
     to be redistributed by value to all AMPs.  The result rows are put
     into Spool 55 (all_amps), which is built locally on the AMPs.  The
     size is estimated with low confidence to be 74,759 rows (
     6,952,587 bytes).
15) We do an all-AMPs STAT FUNCTION step from Spool 55 (Last Use) by
     way of an all-rows scan into Spool 60 (Last Use), which is
     redistributed by hash code to all AMPs.  The result rows are put
     into Spool 5 (all_amps), which is redistributed by hash code to
     all AMPs.  The size is estimated with low confidence to be 74,759
    rows (5,457,407 bytes).
16) We do an all-AMPs RETRIEVE step from Spool 8 by way of an all-rows
     scan with a condition of ("(t2.RDM$END_DATE <= TIMESTAMP
     '9999-12-31 00:00:00.000000') AND ((t2.col1 > TIMESTAMP
     '1900-01-01 00:00:00.000000') AND (NOT (t2.MK IS NULL )))") into
     Spool 90 (all_amps), which is duplicated on all AMPs.  The size of
     Spool 90 is estimated with low confidence to be 156,520 rows (
     5,791,240 bytes).  The estimated time for this step is 0.02
     seconds.
17) We do an all-AMPs JOIN step from Spool 90 (Last Use) by way of an
     all-rows scan, which is joined to Spool 9 by way of an all-rows
     scan.  Spool 90 and Spool 9 are joined using a dynamic hash join,
     with a join condition of ("(LVL_TYPE_MK = MK) AND ((col1
     ,RDM$END_DATE) OVERLAPS (col1 ,RDM$END_DATE))").  The
     result goes into Spool 5 (all_amps), which is redistributed by the
     hash code of ((CASE WHEN ((RDM$OPC = 'D') OR
     (db1.a.RDM$VALIDFROM IS NULL )) THEN (TIMESTAMP
     '1900-01-01 00:00:00.000000') ELSE (db1.a.RDM$VALIDFROM)
     END), TIMESTAMP '9999-12-31 00:00:00.000000', (CASE WHEN
     (db1.a.GUID IS NULL) THEN (db1.o.GUID) ELSE
     (db1.a.GUID) END)) to all AMPs.  The size of Spool 5 is
     estimated with no confidence to be 227,602 rows (16,614,946 bytes).
     The estimated time for this step is 0.19 seconds.
18) We execute the following steps in parallel.
      1) We do an all-AMPs RETRIEVE step from Spool 9 by way of an
         all-rows scan with a condition of ("NOT (t1.MK_SUCCESSOR IS
         NULL)") into Spool 117 (all_amps) fanned out into 7 hash join
         partitions, which is built locally on the AMPs.  The size of
         Spool 117 is estimated with low confidence to be 74,759 rows (
         3,364,155 bytes).  The estimated time for this step is 0.30
         seconds.
      2) We do an all-AMPs RETRIEVE step from Spool 9 by way of an
         all-rows scan with a condition of ("(t3.RDM$END_DATE <=
         TIMESTAMP '9999-12-31 00:00:00.000000') AND
         ((t3.col1 > TIMESTAMP '1900-01-01 00:00:00.000000')
         AND (NOT (t3.MK IS NULL )))") into Spool 118 (all_amps) fanned
         out into 7 hash join partitions, which is duplicated on all
         AMPs.  The result spool file will not be cached in memory.
         The size of Spool 118 is estimated with low confidence to be
         20,932,520 rows (774,503,240 bytes).  The estimated time for
         this step is 0.42 seconds.
19) We do an all-AMPs JOIN step from Spool 117 (Last Use) by way of an
     all-rows scan, which is joined to Spool 118 (Last Use) by way of
     an all-rows scan.  Spool 117 and Spool 118 are joined using a hash
     join of 7 partitions, with a join condition of ("(MK_SUCCESSOR =
     MK) AND ((col1 ,RDM$END_DATE) OVERLAPS (col1
     ,RDM$END_DATE))").  The result goes into Spool 5 (all_amps), which
     is redistributed by the hash code of ((CASE WHEN ((RDM$OPC = 'D')
     OR (db1.a.RDM$VALIDFROM IS NULL )) THEN (TIMESTAMP
     '1900-01-01 00:00:00.000000') ELSE (db1.a.RDM$VALIDFROM)
     END), TIMESTAMP '9999-12-31 00:00:00.000000', (CASE WHEN
     (db1.a.GUID IS NULL) THEN (db1.o.GUID) ELSE
     (db1.a.GUID) END)) to all AMPs.  Then we do a SORT to
     order Spool 5 by the sort key in spool field1 eliminating
     duplicate rows.  The size of Spool 5 is estimated with no
     confidence to be 98,165 rows (7,166,045 bytes).  The estimated
     time for this step is 2.83 seconds.
20) We do an all-AMPs STAT FUNCTION step from Spool 5 (Last Use) by
     way of an all-rows scan into Spool 122 (Last Use), which is
     assumed to be redistributed by value to all AMPs.  The result rows
     are put into Spool 120 (all_amps), which is built locally on the
     AMPs.  The size is estimated with no confidence to be 98,165 rows
     (6,577,055 bytes).  The estimated time for this step is 0.01
     seconds.
21) We execute the following steps in parallel.
      1) We do an all-AMPs RETRIEVE step from Spool 120 (Last Use) by
         way of an all-rows scan into Spool 6 (used to materialize
         view, derived table, table function or table operator vv)
         (all_amps), which is built locally on the AMPs.  The size of
         Spool 6 is estimated with no confidence to be 98,165 rows (
         4,024,765 bytes).  The estimated time for this step is 0.01
         seconds.
      2) We do an all-AMPs RETRIEVE step from Spool 9 (Last Use) by way
         of an all-rows scan into Spool 126 (all_amps), which is
         duplicated on all AMPs.  The result spool file will not be
         cached in memory.  The size of Spool 126 is estimated with low
         confidence to be 20,932,520 rows.  The estimated time for this
         step is 0.21 seconds.
22) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
     all-rows scan, which is joined to Spool 126 by way of an all-rows
     scan.  Spool 6 and Spool 126 are joined using a product join, with
     a join condition of ("(1=1)").  The result goes into Spool 128
     (all_amps), which is built locally on the AMPs.  The result spool
     file will not be cached in memory.  The size of Spool 128 is
     estimated with no confidence to be 7,338,717,235 rows.  The
     estimated time for this step is 42.98 seconds.
23) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
     all-rows scan, which is joined to Spool 126 (Last Use) by way of
     an all-rows scan.  Spool 8 and Spool 126 are joined using a
     product join, with a join condition of ("(1=1)").  The result goes
     into Spool 129 (all_amps), which is duplicated on all AMPs.  The
     result spool file will not be cached in memory.  The size of Spool
     129 is estimated with low confidence to be 11,701,278,680 rows.
     The estimated time for this step is 57.75 seconds.
24) We do an all-AMPs JOIN step from Spool 128 (Last Use) by way of an
     all-rows scan, which is joined to Spool 129 (Last Use) by way of
     an all-rows scan.  Spool 128 and Spool 129 are joined using a
     product join, with a join condition of ("(1=1)").  The result goes
    into Spool 125 (one-amp), which is redistributed by the hash code
     of ('db1', 'tb1') to all AMPs.  The result
     spool file will not be cached in memory.  The size of Spool 125 is
     estimated with no confidence to be *** rows (*** bytes).  The
     estimated time for this step is 1,820,312 hours and 14 minutes.
25) We do a single-AMP SORT to order Spool 125 (one-amp) by eliminate
     duplicate rows.
26) We do a single-AMP MERGE into
     "admin".view_column_data_type from Spool 125 (Last Use).
     The size is estimated with no confidence to be *** rows.  The
     estimated time for this step is 881,263,274 hours and 32 minutes.
27) We spoil the parser's dictionary cache for the table.
28) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> No rows are returned to the user as the result of statement 1.

Answer 1

不应该是一个假脱机问题，因为它利用了一些旧的Tequel（= pre-SQL）语法，其中优化器将视图源代码解析为基表而不实际访问它们。

解释时

insert into view_column_data_type
SELECT TYPE(DBC.TablesV.DatabaseName); -- no FROM!

它应该是这样的：

  1) First, we do an INSERT into Spool 2.
  2) Next, we do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by
     way of an all-rows scan into Spool 1 (one-amp), which is
     redistributed by the hash code of ('DBC', 'ColumnsVX') to few AMPs.
     Then we do a SORT to order Spool 1 by row hash.  The size of Spool
     1 is estimated with high confidence to be 1 row (61 bytes).  The
     estimated time for this step is 0.01 seconds.
  3) We do a single-AMP MERGE into
     xxx.view_column_data_type from Spool 1 (Last Use).
     The size is estimated with high confidence to be 1 row.  The
     estimated time for this step is 1 second.

当然，第2步）有点愚蠢，但无法访问dbc.tvfields，dbc.dbase等。

我无法想象在新版本中这会发生变化......

Answer 2

我不确定是否有任何可能的解决方案只使用sql语言。我的最终解决方案使用BTEQ（不错guide如何使用它）来获取列和表的列表，首先写入文件动态生成的sql查询：

select  'select ' !! Trim(databasename) !! '.'!!Trim(tablename) !! '; ' !! 
    'help column ' !! Trim(databasename) !! '.'!!Trim(tablename) !!   '.* ;' 
from dbc.columnsV 
where (databasename, tablename) in (select databasename, tablename from dbc.tablesV as tb    where tb.tableKind = 'V' 
   and  TRIM( tb.DatabaseName ) IN ( 'db1', 'db2' )) 
;

上面的查询将生成表名加上帮助列结果。

然后可以从任何语言解析生成的csv文件，例如在python 2.7中：

import pandas as pd
df = pd.read_csv('out.csv',sep = ';',)

df_logs = pd.DataFrame([])
for i in range(len(df)):
    if i% 1000 == 0:
        print i
    if df['Column'].iloc[i][:5] == 'sit50':
        full_name = df['Column'].iloc[i]
        j = 3
        while df['Column'].iloc[i+j][:5] != "'db_template":
            if i+j == len(df) - 1:
                break 
            df_logs = df_logs.append([[full_name + ' ' +  df['Column'].iloc[i+j],df['Name'].iloc[i+j]]], ignore_index= True)
            j = j + 1
        i = i + j
df_logs.to_csv("db_logs", sep='\t')

希望，这个解决方案可以帮助别人。

如何获取View Columns的数据类型列表？

2 个答案: