我的公司有一些表需要定期更新查询。通常,我会将查询放入视图中,但目标模式没有无处不在的选择访问权限,IT部门通常需要一段时间来帮助我们解决这个问题,我们需要快速设置表格。
所以,我创建了一个读取.sql文件的python脚本,将其解析为通常涉及创建volatile表的单个语句,然后通过sqlalchemy执行这些语句。最后,python程序生成MERGE
语句,以从.sql文件中的最终volatile表更新目标表。 python看起来像这样:
import sys
import re
import time
from os.path import dirname
import sqlalchemy
import sqlalchemy_teradata
from my_stuff import CSpinner, end_program, Nudger, get_auth, create_update_text
t0 = time.clock()
# Assigning Variables
print('Assigning Variables')
user, pw, connstring, schema = get_auth()
td_engine = sqlalchemy.create_engine(f'teradata://{user}:{pw}@{connstring}/{schema}')
if __name__ == '__main__':
initial_sql_file = rf'{dirname(__file__)}\{sys.argv[1]}'
table_name = sys.argv[2]
keys = eval(sys.argv[3])
# Reading the SQL file to get the result set
with open(initial_sql_file, 'r') as f:
sql_raw = f.read()
statements = re.findall(r'.+?;', sql_raw, re.S)
cols = td_engine.execute(f'SEL TOP 1* from {table_name}').keys()
print(f'Assigned\n\n{table_name}')
# Querying TD using the SQL read above
with CSpinner('\nPerforming SQL Magic...', '...Performed'):
with td_engine.connect() as conn:
n = 0
for sttmnt in statements:
conn.execute(sttmnt)
n += 1
print(f'Statement {n} Complete')
merge_sql = create_update_text('_data', table_name, cols, keys)
print(merge_sql)
conn.execute(merge_sql)
time.sleep(.1)
# ends the program
end_program(t0)
鉴于我之前有issues with the MERGE
statements,我有create_update_text
返回sqlalchemy.text(statement).execution_options(autocommit=True)
,如下所示:
def create_update_text(temp, perm, df, keys, how='both'):
cols = [col for col in df]
for key in keys:
assert key in cols, f'ID {key} not in Columns'
joined_keys = ' AND '.join(f'p."{key}" = t."{key}"' for key in keys)
sql_text = f'MERGE INTO {perm} p\n' \
f'USING {temp} t\n' \
f'ON {joined_keys}\n'
if how in {'both', 'matched'}:
sql_text += 'WHEN MATCHED THEN\n' \
' UPDATE\n' \
' SET\n '
sql_text += ',\n '.join(f'"{col}" = t."{col}"' for col in cols if col not in keys)
if how in {'both', 'not'}:
sql_text += '\nWHEN NOT MATCHED THEN\n' \
' INSERT (\n '
sql_text += ',\n '.join(f'"{col}"' for col in cols)
sql_text += '\n )\n' \
' VALUES (\n '
sql_text += ',\n '.join(f't."{col}"' for col in cols)
sql_text += '\n )'
return sqlalchemy.text(sql_text).execution_options(autocommit=True)
一个简短的.sql文件示例如下:
CREATE VOLATILE TABLE _data as (
SEL
PART_ID,
REMOVAL_DATE,
SUM(FLAGS) AS FLAG_SUM,
CASE
WHEN MECHANIC = 'JOHN' THEN 1
ELSE 0
END AS JOHNORNOT
FROM SCHEMA.TABLE
GROUP BY 1,2,4
) WITH DATA
PRIMARY KEY (PART_ID,REMOVAL_DATE,JOHNORNOT)
ON COMMIT PRESERVE ROWS
;
我遇到了一个问题,使用python运行更新提交语句与将查询和python生成的MERGE
语句粘贴到Teradata Studio并将它们作为单独的语句运行相比,会产生不同的结果。例如,SUM
字段可能会出现不同的值,或者二进制CASE
语句将不会执行相同的操作。
问题是,如果我在TD Studio中运行与SQLAlchemy完全相同的查询,为什么它会产生不同的结果?关于我的程序可能会影响数据吗?
为了简洁地比较,我将TD和PY数据放在单独的表中并运行一个查询,将两者合并在一起,如下所示:
Select
td.PART_ID
td.MONTH
td.YEAR
td.REMOVALS as TD_Removals
py.REMOVALS as PY_Removals
td.FAILURES as TD_Failures
py.FAILURES as PY_Failues
from TD_Data td
join PY_Data py
on td.PART_ID = py.PART_ID
and td.MONTH = py.MONTH
and td.YEAR = py.YEAR
and (td.REMOVALS <> py.REMOVALS
or td.FAILURES <> py.FAILURES)
据我所知,PY_
不匹配的数字总是低于相应的TD_
值。一些样本数据如下所示:
PART_ID MONTH YEAR TD_Removals PY_Removals TD_Fails PY_Fails
26-3132-9-0005 7 2015 2 1 0 0
26-2350-9-0001 3 2015 15 12 11 11
43-3614-9-0002 1 2017 2 0 0 0
97-2373-9-0001 3 2016 8 2 1 1
26-7410-9-0001 7 2016 6 1 0 0
26-3155-9-0003 9 2015 1 0 0 0
97-3510-9-0001 7 2017 28 26 0 0
97-2792-9-0006 6 2017 3 2 0 0
26-7933-9-0001 10 2015 3 0 0 0
97-2313-9-0002 3 2016 15 14 13 13
29-2800-9-0009 6 2017 3 2 0 0
26-3242-9-0006 7 2016 7 0 0 0
下面是@dnoeth的评论,我在python和TD_Studio上都运行了SELECT Transaction_Mode FROM dbc.sessioninfoV WHERE SessionNo = SESSION;
。 TD_Studio产生A
和python产生T
。现在问题变得更加正确,以及如何确保未来的相似性。
答案 0 :(得分:1)
根据评论,Python / Studio的会话使用不同的交易模式:
SELECT Transaction_Mode -- T=Teradata, A=ANSI mode
FROM dbc.sessioninfoV
WHERE SessionNo = SESSION;
会话模式在JDBC中通过TMODE
属性设置,可能的值为TERA
,ANSI
&amp; DEFAULT
。
其中一个区别是文字的区分大小写,在Teradata会话中它不区分大小写,而ANSI会话默认区分大小写,这解释了当WHEN MECHANIC = 'JOHN' THEN 1
和JOHN
时John
的不同数量的匹配NOT CASESPECIFIC
。
Comparison of Transactions in ANSI and Teradata Session Modes
当在CREATE TABLE中将列定义为NOT CASESPECIFIC
时,您必须切换到Teradata模式或在每个字符串文字后添加(WHEN MECHANIC = 'JOHN' (NOT CASESPECIFIC) THEN 1
),例如.subHeadingHover
。
对于Teradata和ANSI会话之间的所有差异,请参阅 手册中的Transaction Processing章节。