我正在尝试围绕整个脚本包装进度指示器。但是,set_index(..., compute=False)
仍然在调度程序上运行任务,在Web界面中可以观察到。
如何报告set_index
步骤的进度?
import dask.dataframe as dd
from dask.distributed import Client, progress
if __name__ == '__main__':
with Client() as client:
df = dd.read_csv('big.csv')
# I can see on the web interface that something is happening.
# This blocks 20-30s on this particular CSV.
df = df.set_index('id', compute=False)
# Progress reporting works from here
out = client.compute(
df
)
progress(out)
# out.result()
# ...