优化Python线程化SQL查询

时间:2019-04-15 19:42:38

标签: python sql-server multithreading

从实验上看,线程化数百个“小型” SQL查询比运行一个较大的查询要快得多。

下面的代码有效,但是我想知道是否有人有优化它的提示?在大约500个查询中运行它并返回总计1000万行时,CPU使用率在90-100%之间波动。

0。组织

# 1. Import standard modules. 
# 2. FUNCTION - Establish multiple db connections.
# 3. FUNCTION - Execute multiple queries using multiple db connections from #2. 
# 4. FUNCTION - Close db connections from #2. 
# 5. FUNCTION - Use #2 to establish multiple db connections, #3 to execute multiple queries, #4 to close db connections.

1。导入模块

import threading as th
import pyodbc
import pandas as pd
pyodbc.pooling = False

2。连接到数据库num_queries次

def connect(connection_string , num_queries):

    connections , threads = [] , []

    def myfunc(i):
        connection = pyodbc.connect(connection_string)
        connections.append(connection)
    for i in range(num_queries):
        t = th.Thread(target=myfunc , args=(i,))
        threads.append(t)
    for t in threads:
        t.start() 
    for t in threads:
        t.join()

    return connections

3。设置线程功能以并行执行查询

def concurrent(queries , connections):
    df , threads = [] , []
    num_queries = len(queries) 

    def myfunc(i):
        df.append(pd.read_sql_query(queries[i] , connections[i]))
    for i in range(num_queries):
        t = th.Thread(target=myfunc , args=(i,))
        threads.append(t)
    for t in threads:
        t.start()  
    for t in threads:
        t.join()    

    return pd.concat(df)

4。关闭数据库连接

def close(connections):
    threads = []

    def myfunc(i):
        i.close()
    for i in connections:
        t = th.Thread(target=myfunc , args=(i,))
        threads.append(t)   
    for t in threads:
        t.start()
    for t in threads:
        t.join() 
    for i in reversed(connections):
        connections.remove(i)

5。美东时间。连接,运行查询,关闭连接。返回结果。

def query(queries , connection_string):
    num_queries = len(queries)
    connections = connect(connection_string , num_queries)
    df = concurrent(queries , connections)
    close(connections)

    return df

6。运行作业

if __name__ == "__main__":
    queries = ['SELECT * FROM TBL_1' , 'SELECT * FROM TBL_2' , ...]
    connection_string = 'DRIVER={SQL Server Native Client 11.0};SERVER = ...'
    query(queries , connection_string)

0 个答案:

没有答案