目标:执行多处理/多线程数据提取以创建数据框并进行合并 他们创建一个最终的数据框。
这是我要尝试的内容:
# List of parms to pull data for
itera = [1,2,3,4,5]
# Method to multithread/multiprocess
def sql_pull(i):
# Create empty list (Intention is that data pulls will reside in this list as dataframes)
df_list = []
# Set the parameter to use in the SQL Query
sparm['parm'] = itera[i][0]
try:
# Execute the pull for parameter and append dataframes into the list
query= """
SELECT * from table where ID = :parm
"""
df_list.append(pd.read_sql(query, con=connection, params=sparm))
print(type(df_list)) ## <class 'list'>
print(type(df_list[0])) ## <class 'pandas.core.frame.DataFrame'>
return df_list
except Exception:
sys.exit(1)
if __name__ == '__main__':
# Fix the number of cores
pool = Pool(5)
# Start multiprocessing/multithreading
emap = pool.map(sql_pull, range(len(itera)))
print(type(emap)) ## <class 'list'>
print(type(emap[0])) ## <class 'list'>
# Consolidate Dataframes
final = pd.concat(emap)
出现以下错误:
Traceback (most recent call last):
File "code.py", line 213, in <module>
final = pd.concat(emap)
File "/pandas/core/reshape/concat.py", line 228, in concat
copy=copy, sort=sort)
File "/pandas/core/reshape/concat.py", line 289, in __init__
raise TypeError(msg)
TypeError: cannot concatenate object of type "<class 'list'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
我在做什么错,任何解决此问题的想法都会受到赞赏。