我有一个项目,我输入一个Terdata数据库表名作为参数,执行一个SQL语句,为每个列提供表(min,max等)的聚合,并返回该信息,然后我将其放入数据帧。 我想要做的是获取数据帧中的行(表中每列column_name 1行)并将结果插入另一个数据库表中。数据分析'结果将存储在哪里。
def main():
def func_1(cfg_tbl):
udaExec = teradata.UdaExec(appName="DataAnalysis", version="1.0", logConsole=False)
main_query = """
SELECT 'SELECT '''
|| TRIM(ColumnName)
|| ''', COUNT(DISTINCT "' || ColumnName || '") AS DISTINCT_COUNT,'
|| ' COUNT(1) - COUNT("' || ColumnName || '") AS NULL_COUNT,'
|| ' MAX("' || ColumnName || '") AS MAX_COL_VALUE,'
|| ' MIN("' || ColumnName || '") AS MIN_COL_VALUE,'
|| CASE WHEN ColumnType IN ('I', 'D', 'F', 'I1', 'I2', 'I8', 'N', 'DA', 'TS') THEN ' MAX(LENGTH(TO_CHAR("' || ColumnName || '")))'
WHEN ColumnType IN ('CF', 'CV', 'CO') THEN ' MAX(LENGTH("' || ColumnName || '"))'
ELSE NULL END || ' AS MAX_COLUMN_LENGTH,'
|| CASE WHEN ColumnType IN ('I', 'D', 'F', 'I1', 'I2', 'I8', 'N', 'DA', 'TS') THEN ' MIN(LENGTH(TO_CHAR("' || ColumnName || '")))'
WHEN ColumnType IN ('CF', 'CV', 'CO') THEN ' MIN(LENGTH("' || ColumnName || '"))'
ELSE NULL END || ' AS MIN_COLUMN_LENGTH,'
|| ' COUNT(1) AS TABLE_COUNT,'
|| ' ''%s_%s'' AS TABLE_NM,'
|| ' ''%s'' AS SOURCE_TYPE'
|| ' FROM ' || TRIM(DatabaseName) || '.' || TRIM(TableName) || ';' AS COL
FROM DBC.ColumnsV A
WHERE DatabaseName = 'XXX'
AND TableName = '%s_%s'
""" % (cfg_pre, cfg_tbl, cfg_src, cfg_pre, cfg_tbl)
# connect to Teradata, execute above sql and subsequent SELECT statements
session = udaExec.connect(method="odbc", dsn="XXXX", username="XXX", password="XXX")
pd.set_option('max_colwidth', 500)
df = pd.read_sql(main_query, session)
sql_execute = list(df.values.flatten())
col = ['COLUMN_NAME', 'DISTINCT_COUNT', 'NULL_COUNT', 'MAX_COL_VALUE', 'MIN_COL_VALUE',
'MAX_COL_LENGTH', 'MIN_COL_LENGTH', 'TABLE_CNT', 'TABLE_NM', 'DATA_SOURCE']
jdf = pd.DataFrame(columns=col)
for script in sql_execute:
jdf = pd.read_sql(script, session)
print(jdf)
session.execute("""INSERT INTO DATA_ANALYSIS(COLUMN_NM, DISTINCT_COUNT, NULL_COUNT, MAX_COL_VALUE, MIN_COL_VALUE,
MAX_COL_LENGTH, MIN_COL_LENGTH, TABLE_CNT, TABLE_NM, DATA_SOURCE)
VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""", (jdf[0], jdf[1], jdf[2], jdf[3],
jdf[4], jdf[5], jdf[6], jdf[7], jdf[8], jdf[9]))
我面临的问题是底部的INSERT代码。我得到一个KeyError:0,我知道我在插入时必须做错事。有什么想法吗?
确切的错误是:
Traceback (most recent call last):
File "C:\Users\xxx\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 2442, in get_loc
return self._engine.get_loc(key)
File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)
File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)
File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)
File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)
KeyError: 0