sqlalchemy和Teradata Studio从相同的陈述中产生不同的结果

时间:2017-07-26 20:19:40

标签: python sqlalchemy teradata

我的公司有一些表需要定期更新查询。通常,我会将查询放入视图中,但目标模式没有无处不在的选择访问权限,IT部门通常需要一段时间来帮助我们解决这个问题,我们需要快速设置表格。

所以,我创建了一个读取.sql文件的python脚本,将其解析为通常涉及创建volatile表的单个语句,然后通过sqlalchemy执行这些语句。最后,python程序生成MERGE语句,以从.sql文件中的最终volatile表更新目标表。 python看起来像这样:

import sys
import re
import time
from os.path import dirname

import sqlalchemy
import sqlalchemy_teradata
from my_stuff import CSpinner, end_program, Nudger, get_auth, create_update_text

t0 = time.clock()

# Assigning Variables
print('Assigning Variables')
user, pw, connstring, schema = get_auth()
td_engine = sqlalchemy.create_engine(f'teradata://{user}:{pw}@{connstring}/{schema}')

if __name__ == '__main__':
    initial_sql_file = rf'{dirname(__file__)}\{sys.argv[1]}'
    table_name = sys.argv[2]
    keys = eval(sys.argv[3])

    # Reading the SQL file to get the result set
    with open(initial_sql_file, 'r') as f:
        sql_raw = f.read()

    statements = re.findall(r'.+?;', sql_raw, re.S)
    cols = td_engine.execute(f'SEL TOP 1* from {table_name}').keys()
    print(f'Assigned\n\n{table_name}')

    # Querying TD using the SQL read above
    with CSpinner('\nPerforming SQL Magic...', '...Performed'):
        with td_engine.connect() as conn:
            n = 0
            for sttmnt in statements:
                conn.execute(sttmnt)
                n += 1
                print(f'Statement {n} Complete')
            merge_sql = create_update_text('_data', table_name, cols, keys)
            print(merge_sql)
            conn.execute(merge_sql)
    time.sleep(.1)

    # ends the program
    end_program(t0)

鉴于我之前有issues with the MERGE statements,我有create_update_text返回sqlalchemy.text(statement).execution_options(autocommit=True),如下所示:

def create_update_text(temp, perm, df, keys, how='both'):
    cols = [col for col in df]

    for key in keys:
        assert key in cols, f'ID {key} not in Columns'

    joined_keys = ' AND '.join(f'p."{key}" = t."{key}"' for key in keys)
    sql_text = f'MERGE INTO {perm} p\n' \
               f'USING {temp} t\n' \
               f'ON {joined_keys}\n'

    if how in {'both', 'matched'}:
        sql_text += 'WHEN MATCHED THEN\n' \
                    ' UPDATE\n' \
                    ' SET\n     '

        sql_text += ',\n     '.join(f'"{col}" = t."{col}"' for col in cols if col not in keys)

    if how in {'both', 'not'}:
        sql_text += '\nWHEN NOT MATCHED THEN\n' \
                    ' INSERT (\n     '

        sql_text += ',\n     '.join(f'"{col}"' for col in cols)

        sql_text += '\n )\n' \
                    ' VALUES (\n     '

        sql_text += ',\n     '.join(f't."{col}"' for col in cols)

        sql_text += '\n )'

    return sqlalchemy.text(sql_text).execution_options(autocommit=True)

一个简短的.sql文件示例如下:

CREATE VOLATILE TABLE _data as (
    SEL
     PART_ID,
     REMOVAL_DATE,
     SUM(FLAGS) AS FLAG_SUM,

     CASE
      WHEN MECHANIC = 'JOHN' THEN 1
      ELSE 0
     END AS JOHNORNOT

    FROM SCHEMA.TABLE

    GROUP BY 1,2,4
) WITH DATA
  PRIMARY KEY (PART_ID,REMOVAL_DATE,JOHNORNOT)
  ON COMMIT PRESERVE ROWS
;

我遇到了一个问题,使用python运行更新提交语句与将查询和python生成的MERGE语句粘贴到Teradata Studio并将它们作为单独的语句运行相比,会产生不同的结果。例如,SUM字段可能会出现不同的值,或者二进制CASE语句将不会执行相同的操作。

问题是,如果我在TD Studio中运行与SQLAlchemy完全相同的查询,为什么它会产生不同的结果?关于我的程序可能会影响数据吗?

为了简洁地比较,我将TD和PY数据放在单独的表中并运行一个查询,将两者合并在一起,如下所示:

Select
 td.PART_ID
 td.MONTH
 td.YEAR
 td.REMOVALS as TD_Removals
 py.REMOVALS as PY_Removals
 td.FAILURES as TD_Failures
 py.FAILURES as PY_Failues

from TD_Data td

join PY_Data py
  on td.PART_ID = py.PART_ID
 and td.MONTH = py.MONTH
 and td.YEAR = py.YEAR
 and (td.REMOVALS <> py.REMOVALS
      or td.FAILURES <> py.FAILURES)

据我所知,PY_不匹配的数字总是低于相应的TD_值。一些样本数据如下所示:

PART_ID        MONTH YEAR TD_Removals PY_Removals TD_Fails PY_Fails
26-3132-9-0005 7     2015 2           1           0        0
26-2350-9-0001 3     2015 15          12          11       11
43-3614-9-0002 1     2017 2           0           0        0
97-2373-9-0001 3     2016 8           2           1        1
26-7410-9-0001 7     2016 6           1           0        0
26-3155-9-0003 9     2015 1           0           0        0
97-3510-9-0001 7     2017 28          26          0        0
97-2792-9-0006 6     2017 3           2           0        0
26-7933-9-0001 10    2015 3           0           0        0
97-2313-9-0002 3     2016 15          14          13       13
29-2800-9-0009 6     2017 3           2           0        0
26-3242-9-0006 7     2016 7           0           0        0

下面是@dnoeth的评论,我在python和TD_Studio上都运行了SELECT Transaction_Mode FROM dbc.sessioninfoV WHERE SessionNo = SESSION;。 TD_Studio产生A和python产生T。现在问题变得更加正确,以及如何确保未来的相似性。

1 个答案:

答案 0 :(得分:1)

根据评论,Python / Studio的会话使用不同的交易模式:

 SELECT Transaction_Mode -- T=Teradata, A=ANSI mode
 FROM dbc.sessioninfoV
 WHERE SessionNo = SESSION;

会话模式在JDBC中通过TMODE属性设置,可能的值为TERAANSI&amp; DEFAULT

其中一个区别是文字的区分大小写,在Teradata会话中它不区分大小写,而ANSI会话默认区分大小写,这解释了当WHEN MECHANIC = 'JOHN' THEN 1JOHNJohn的不同数量的匹配NOT CASESPECIFIC

Comparison of Transactions in ANSI and Teradata Session Modes

当在CREATE TABLE中将列定义为NOT CASESPECIFIC时,您必须切换到Teradata模式或在每个字符串文字后添加(WHEN MECHANIC = 'JOHN' (NOT CASESPECIFIC) THEN 1),例如.subHeadingHover

对于Teradata和ANSI会话之间的所有差异,请参阅 手册中的Transaction Processing章节。