如何使用多进程数据提取创建合并的Dataframe?

时间:2019-11-05 18:56:56

标签: python pandas multiprocessing

目标:执行多处理/多线程数据提取以创建数据框并进行合并 他们创建一个最终的数据框。

这是我要尝试的内容:

# List of parms to pull data for
itera = [1,2,3,4,5]

# Method to multithread/multiprocess
def sql_pull(i):

    # Create empty list (Intention is that data pulls will reside in this list as dataframes)
    df_list = []

    # Set the parameter to use in the SQL Query
    sparm['parm'] = itera[i][0]    

    try:
        # Execute the pull for parameter and append dataframes into the list 
        query= """
                  SELECT * from table where ID = :parm
               """
        df_list.append(pd.read_sql(query, con=connection, params=sparm))

        print(type(df_list))    ## <class 'list'> 
        print(type(df_list[0])) ## <class 'pandas.core.frame.DataFrame'>

        return df_list

    except Exception:
        sys.exit(1)


if __name__ == '__main__':

    # Fix the number of cores
    pool = Pool(5)      

    # Start multiprocessing/multithreading                             
    emap = pool.map(sql_pull, range(len(itera))) 
    print(type(emap))    ## <class 'list'>
    print(type(emap[0])) ## <class 'list'>

    # Consolidate Dataframes
    final = pd.concat(emap)

出现以下错误:

Traceback (most recent call last):
  File "code.py", line 213, in <module>
    final = pd.concat(emap)
  File "/pandas/core/reshape/concat.py", line 228, in concat
    copy=copy, sort=sort)
  File "/pandas/core/reshape/concat.py", line 289, in __init__
    raise TypeError(msg)
TypeError: cannot concatenate object of type "<class 'list'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

我在做什么错,任何解决此问题的想法都会受到赞赏。

0 个答案:

没有答案