以下类Downloader
应该多次查询SQL数据库并将结果存储在pandas.DataFrame
个对象列表中。
我想使用multiprocessing
加速检索,但是我收到了错误
line 53, in run_queries
dfs_queries = p.map(run_query, queries)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
我查看了this问题,该问题表明pyodbc连接和游标对象无法被腌制。当pool.map(f, arglist)
依赖于SQL连接时,有没有办法在multiprocessing
中使用f
?
import pyodbc
from multiprocessing import Pool as ThreadPool
import pandas as pd
class Downloader(object):
def _connect(self, path_db_config):
# ... Loads a config file from which it gets dsn, user and password ... #
con_string = 'DSN=%s;UID=%s;PWD=%s;' % (dsn, user, password)
return pyodbc.connect(con_string)
def run_queries(self):
queries = [# List of sql queries #]
p = ThreadPool(len(queries))
def run_query(query):
cnxn = self._connect(PATH_DB_CONFIG)
df = pd.read_sql(query, cnxn)
return df
return p.map(run_query, queries)
感谢您的帮助!!