Question

目标：执行多处理/多线程数据提取以创建数据框并进行合并他们创建一个最终的数据框。

这是我要尝试的内容：

# List of parms to pull data for
itera = [1,2,3,4,5]

# Method to multithread/multiprocess
def sql_pull(i):

    # Create empty list (Intention is that data pulls will reside in this list as dataframes)
    df_list = []

    # Set the parameter to use in the SQL Query
    sparm['parm'] = itera[i][0]    

    try:
        # Execute the pull for parameter and append dataframes into the list 
        query= """
                  SELECT * from table where ID = :parm
               """
        df_list.append(pd.read_sql(query, con=connection, params=sparm))

        print(type(df_list))    ## <class 'list'> 
        print(type(df_list[0])) ## <class 'pandas.core.frame.DataFrame'>

        return df_list

    except Exception:
        sys.exit(1)


if __name__ == '__main__':

    # Fix the number of cores
    pool = Pool(5)      

    # Start multiprocessing/multithreading                             
    emap = pool.map(sql_pull, range(len(itera))) 
    print(type(emap))    ## <class 'list'>
    print(type(emap[0])) ## <class 'list'>

    # Consolidate Dataframes
    final = pd.concat(emap)

出现以下错误：

Traceback (most recent call last):
  File "code.py", line 213, in <module>
    final = pd.concat(emap)
  File "/pandas/core/reshape/concat.py", line 228, in concat
    copy=copy, sort=sort)
  File "/pandas/core/reshape/concat.py", line 289, in __init__
    raise TypeError(msg)
TypeError: cannot concatenate object of type "<class 'list'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid

我在做什么错，任何解决此问题的想法都会受到赞赏。

如何使用多进程数据提取创建合并的Dataframe？

0 个答案: