我有一些代码用read_sql()
创建一个生成器,并循环遍历该生成器以打印每个块:
execute.py
import pandas as pd
from sqlalchemy import event, create_engine
engine = create_engine('path-to-driver')
def getDistance(chunk):
print(chunk)
print(type(chunk))
df_chunks = pd.read_sql("select top 2 * from SCHEMA.table_name", engine, chunksize=1)
for chunk in df_chunks:
result = getDistance(chunk)
它起作用,并且每个块都作为DataFrame打印。当我尝试通过这种多重处理来做同样的事情时...
outside_function.py
def getDistance(chunk):
print(chunk)
print(type(chunk))
df = chunk
return df
execute.py
import pandas as pd
from sqlalchemy import event, create_engine
engine = create_engine('path-to-driver')
df_chunks = pd.read_sql("select top 2 * from SCHEMA.table_name", engine, chunksize=1)
if __name__ == '__main__':
global result
p = Pool(20)
for chunk in df_chunks:
print(chunk)
result = p.map(getDistance, chunk)
p.terminate()
p.join()
......块在控制台中以列名的形式打印为“ str”。打印出result
将显示此['column_name']
。
为什么在应用多处理程序时,这些块会变成仅是列名的字符串?
答案 0 :(得分:1)
这是因为p.map
需要一个函数和一个可迭代的函数。遍历数据框(在这种情况下,您的chunk
)将产生列名。
您需要将一组数据框传递给map方法。即:
global result
p = Pool(20)
result = p.map(getDistance, df_chunks)
p.terminate()
p.join()