具有需要计算关键字参数的函数的自定义搜索图

时间:2018-07-04 17:14:37

标签: python dask

一个人如何使用需要关键字参数的函数来构造自定义的dask图,而关键字参数又是另一个dask任务的结果?

模糊的文档和一些堆栈溢出问题建议使用partialtoolzdask.compatibility.apply。所有这些解决方案都适用于静态关键字参数。我从Including keyword arguments (kwargs) in custom Dask graphs以及对源代码和调试器的一些阅读中得出的理解是,dask.compatibility.apply可能能够使用由于计算迟钝而引起的关键字参数。但是,我似乎无法正确理解语法,也找不到其他答案。

下面的示例显示了dask.compatibility.apply的一个相对简单的应用,它具有简单的关键字计算值。 Dask成功传递了计算的参数'a''b'的值,以及静态关键字值'other'。但是,它会将字符串'c'传递给函数,而不是将其替换为其计算值。

import dask
from dask.compatibility import apply


def custom_func(a, b, other=None, c=None):
    print(a, b, other, c)
    return a * b / c / other


dsk = {
    'a': (sum, (1, 1)),
    'b': (sum, (2, 2)),
    'c': (sum, (3, 3)),
    'd': (apply, custom_func, ['a', 'b'], {'c': 'c', 'other': 2})
}

dask.visualize(dsk, filename='graph.png')
for key in sorted(dsk):
    print(key)
    print(dask.get(dsk, key))
    print('\n')

输出如下:

a
2


b
4


c
6


d
2 4 2 c
Traceback (most recent call last):
  File "dask_kwarg.py", line 20, in <module>
    print(dask.get(dsk, key))
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 562, in get_sync
    return get_async(apply_sync, 1, dsk, keys, **kwargs)
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 529, in get_async
    fire_task()
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 504, in fire_task
    callback=queue.put)
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 551, in apply_sync
    res = func(*args, **kwds)
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 295, in execute_task
    result = pack_exception(e, dumps)
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 290, in execute_task
    result = _execute_task(task, data)
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/local.py", line 271, in _execute_task
    return func(*args2)
  File "/Users/holmgren/miniconda3/envs/pvlib36/lib/python3.6/site-packages/dask/compatibility.py", line 50, in apply
    return func(*args, **kwargs)
  File "dask_kwarg.py", line 7, in custom_func
    return a * b / c / other
TypeError: unsupported operand type(s) for /: 'int' and 'str'

graph.png

1 个答案:

答案 0 :(得分:2)

一种方法是找出dask.delayed是如何完成的:)

In [1]: import dask

In [2]: @dask.delayed
   ...: def f(*args, **kwargs):
   ...:     pass
   ...: 

In [3]: dict(f(x=1).dask)
Out[3]: 
{'f-d2cd50e7-25b1-49c5-b463-f05198b09dfb': (<function dask.compatibility.apply>,
  <function __main__.f>,
  [],
  (dict, [['x', 1]]))}
有趣的是,这也是本地调度程序和分布式调度程序不一致的情况。分布式调度程序可以很好地解决这个问题。

In [1]: from dask.distributed import Client

In [2]: client = Client()

In [3]: import dask
   ...: from dask.compatibility import apply
   ...: 
   ...: 
   ...: def custom_func(a, b, other=None, c=None):
   ...:     print(a, b, other, c)
   ...:     return a * b / c / other
   ...: 
   ...: 
   ...: dsk = {
   ...:     'a': (sum, (1, 1)),
   ...:     'b': (sum, (2, 2)),
   ...:     'c': (sum, (3, 3)),
   ...:     'd': (apply, custom_func, ['a', 'b'], {'c': 'c', 'other': 2})
   ...: }
   ...: 

In [4]: for key in sorted(dsk):
   ...:     print(key, client.get(dsk, key))
   ...:     
a 2
b 4
c 6
2 4 2 6
d 0.6666666666666666