Question

为了获取我需要稍后使用Matlab处理的一些数据，我使用python脚本从一系列50多个相同的数据库中提取数据（即所有数据共享相同的表结构）

我能够使用下面的代码。但是，为了避免创建空文本文件（由于其中一些数据库根本没有相关数据），我首先运行查询只是为了检查它是否返回空，然后我就是这样。我被迫再次运行它以获取数据本身并将其写入文件。

import thesis,pyodbc

# SQL Server settings
drvr = '{SQL Server Native Client 10.0}'
host = 'POLIVEIRA-PC\\MSSQLSERVER2008'
user = 'username'
pswd = 'password'

# Establish a connection to SQL Server
cnxn = pyodbc.connect(driver=drvr, server=host, uid=user, pwd=pswd) # Setup connection

# Prepare condition
tags = thesis.sensors().keys()
condition = ' WHERE Tag_ID=' + tags[0]
for tag in tags[1:]:
    condition += ' OR Tag_ID=' + tag

# Extract data from each database
for db in thesis.db_list():
    # Prepare query
    table = '[' + db + '].dbo.tBufferAux'
    query  = 'SELECT Data, Tag_ID, Valor FROM ' + table + condition + ' ORDER BY Data ASC'
    # Check if query's output is empty
    if not cnxn.cursor().execute(query).fetchone():
        print db, 'has no records!'
        continue # If so, jump to next database
    # Otherwise, save query's output to text file
    filename = 'Dataset_' + db + '.txt'
    filepath = thesis.out_dir() + filename
    with open(filepath,'w') as file:
        for record in cnxn.cursor().execute(query):
            file.write(str(record.Data) + ' ' + str(record.Tag_ID) + ' ' + str(record.Valor) + '\n')

# Close session
cnxn.cursor().close()
cnxn.close()

虽然这段代码运行良好并且在大约20秒内完成，但我很好奇是否有任何方法可以通过避免重复执行查询来优化此脚本，即避免两次调用cnxn.cursor().execute(query)。 / p>

顺便说一句，我对Python和SQL都很陌生，所以如果你在我的代码中发现错误或者不被视为一个好习惯并且告诉我，我会感激不尽。

Answer 1

首先，我建议你查看pymssql，它有一些很好的功能pyodbc没有。

其次我更强烈建议查看Sql Server bcp或SSIS。它们是为这类东西而构建的，并且比使用python更有效。

第三，如果所有db都在同一台服务器上，你实际上可以使用master.sys.databases在T-SQL中完成所有工作并将工作推送到服务器。

考虑到这一点：

import thesis,pyodbc

# SQL Server settings
drvr = '{SQL Server Native Client 10.0}'
host = 'POLIVEIRA-PC\\MSSQLSERVER2008'
user = 'username'
pswd = 'password'

# Establish a connection to SQL Server
cnxn = pyodbc.connect(driver=drvr, server=host, uid=user, pwd=pswd) # Setup     connection

# Prepare condition
tags = thesis.sensors().keys()
condition = ' WHERE Tag_ID=' + tags[0]
for tag in tags[1:]:
    condition += ' OR Tag_ID=' + tag

# Extract data from each database
for db in thesis.db_list():
    # Prepare query
    table = '[' + db + '].dbo.tBufferAux'
    query  = 'SELECT Data, Tag_ID, Valor FROM ' + table + condition + '    ORDER BY Data ASC'
    # Check if query's output is empty
    cursor = cnxn.cursor()
    cursor.execute(query)
    if cursor.rowcount == 0:
        print db, 'has no records!'
    else:
        filename = 'Dataset_' + db + '.txt'
        filepath = thesis.out_dir() + filename
        with open(filepath,'w') as file:
            while cursor.fetchone():
                file.write(str(record.Data) + ' ' + str(record.Tag_ID) + ' ' + str(record.Valor) + '\n')

# Close session
cnxn.close()

更多风格的东西：

尽可能避免继续使用if-else作为控制流程
Pyodbc游标可以执行多个查询并坚持不懈。每次执行查询时都不需要创建新的。
光标'记住'它执行的最后一个查询。
我怀疑如果你使用空格分隔的文件你会感到难过......我从经验中说：......（
游标超出范围时会自动关闭，因此如果没有必要，则游标为.close（）。

使用Pyodbc方法优化脚本

1 个答案: