pandas中的python异步read_sql

时间:2017-08-23 08:50:28

标签: python multithreading pandas

我希望通过将查询拆分为4来加快从数据库获取数据的过程。我使用apply_async编写了以下代码。但是,当使用get()时,会出现酸洗错误。我该怎么办?非常感谢你。

from multiprocessing import Pool
pool = Pool(processes=4)
start_date = datetime.datetime(2017, 1, 1)
end_date = datetime.datetime(2017, 6, 30)
period = (end_date-start_date)/4
conn = pyodbc.connect(
    r'DRIVER={SQL Server};'
    r'SERVER=abc;'
    r'PORT=111;'
    r'DATABASE=db;'
    r'UID=abc;'
    r'PWD=xyz;'
    r'TDS_Version=7.1'
    )

for p in np.arange(start_date, end_date, period).astype(datetime.datetime):
    sql = "SELECT * FROM db where date between \'" +  str(p) +  "\' and \'" +  str(p + period) + "\'"
    res.append(pool.apply_async(lambda x: pd.read_sql(x[0], con = x[1]), ([sql, conn],)))      # runs in *only* one process
pool.close() 

res[0].get()#<-------PicklingError: Can't pickle <function <lambda> at 0x00000045566BDAE8>: attribute lookup <lambda>

1 个答案:

答案 0 :(得分:0)

您需要将连接线移动到每个子进程中:通过将连接到服务器然后发送请求的例程替换“lambda x ...”。您无法打开单个连接并在子进程之间共享

或者,你可以用aioodbc替换pyodbc: https://github.com/aio-libs/aioodbc 这将允许您使用asyncio

实现所需的功能