一个人如何使用需要关键字参数的函数来构造自定义的dask图,而关键字参数又是另一个dask任务的结果?
模糊的文档和一些堆栈溢出问题建议使用partial
,toolz
或dask.compatibility.apply
。所有这些解决方案都适用于静态关键字参数。我从Including keyword arguments (kwargs) in custom Dask graphs以及对源代码和调试器的一些阅读中得出的理解是,dask.compatibility.apply
可能能够使用由于计算迟钝而引起的关键字参数。但是,我似乎无法正确理解语法,也找不到其他答案。
下面的示例显示了dask.compatibility.apply
的一个相对简单的应用,它具有简单的关键字计算值。 Dask成功传递了计算的参数'a'
和'b'
的值,以及静态关键字值'other'
。但是,它会将字符串'c'
传递给函数,而不是将其替换为其计算值。
import dask
from dask.compatibility import apply
def custom_func(a, b, other=None, c=None):
print(a, b, other, c)
return a * b / c / other
dsk = {
'a': (sum, (1, 1)),
'b': (sum, (2, 2)),
'c': (sum, (3, 3)),
'd': (apply, custom_func, ['a', 'b'], {'c': 'c', 'other': 2})
}
dask.visualize(dsk, filename='graph.png')
for key in sorted(dsk):
print(key)
print(dask.get(dsk, key))
print('\n')
输出如下:
a
2
b
4
c
6
d
2 4 2 c
Traceback (most recent call last):
File "dask_kwarg.py", line 20, in <module>
print(dask.get(dsk, key))
File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 562, in get_sync
return get_async(apply_sync, 1, dsk, keys, **kwargs)
File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 529, in get_async
fire_task()
File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 504, in fire_task
callback=queue.put)
File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 551, in apply_sync
res = func(*args, **kwds)
File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 295, in execute_task
result = pack_exception(e, dumps)
File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 290, in execute_task
result = _execute_task(task, data)
File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 271, in _execute_task
return func(*args2)
File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/compatibility.py", line 50, in apply
return func(*args, **kwargs)
File "dask_kwarg.py", line 7, in custom_func
return a * b / c / other
TypeError: unsupported operand type(s) for /: 'int' and 'str'
答案 0 :(得分:2)
一种方法是找出dask.delayed是如何完成的:)
In [1]: import dask
In [2]: @dask.delayed
...: def f(*args, **kwargs):
...: pass
...:
In [3]: dict(f(x=1).dask)
Out[3]:
{'f-d2cd50e7-25b1-49c5-b463-f05198b09dfb': (<function dask.compatibility.apply>,
<function __main__.f>,
[],
(dict, [['x', 1]]))}
有趣的是,这也是本地调度程序和分布式调度程序不一致的情况。分布式调度程序可以很好地解决这个问题。
In [1]: from dask.distributed import Client
In [2]: client = Client()
In [3]: import dask
...: from dask.compatibility import apply
...:
...:
...: def custom_func(a, b, other=None, c=None):
...: print(a, b, other, c)
...: return a * b / c / other
...:
...:
...: dsk = {
...: 'a': (sum, (1, 1)),
...: 'b': (sum, (2, 2)),
...: 'c': (sum, (3, 3)),
...: 'd': (apply, custom_func, ['a', 'b'], {'c': 'c', 'other': 2})
...: }
...:
In [4]: for key in sorted(dsk):
...: print(key, client.get(dsk, key))
...:
a 2
b 4
c 6
2 4 2 6
d 0.6666666666666666