通过回调将函数转换为Python生成器?

时间:2012-04-01 21:41:36

标签: python generator coroutine

Scipy最小化函数(仅用作示例),可以选择在每一步添加回调函数。所以我可以做点什么,

def my_callback(x):
    print x
scipy.optimize.fmin(func, x0, callback=my_callback)

有没有办法使用回调函数来创建fmin的生成器版本,这样我才能做到,

for x in my_fmin(func,x0):
    print x

似乎可能有一些收益和发送的组合,但我可以想到任何事情。

6 个答案:

答案 0 :(得分:15)

正如评论中所指出的,您可以使用Queue在新主题中执行此操作。缺点是您仍然需要某种方式来访问最终结果(最后fmin返回的内容)。我下面的例子使用一个可选的回调来做一些事情(另一种选择就是产生它,尽管你的调用代码必须区分迭代结果和最终结果):

from thread import start_new_thread
from Queue import Queue

def my_fmin(func, x0, end_callback=(lambda x:x), timeout=None):

    q = Queue() # fmin produces, the generator consumes
    job_done = object() # signals the processing is done

    # Producer
    def my_callback(x):
        q.put(x)
    def task():
        ret = scipy.optimize.fmin(func,x0,callback=my_callback)
        q.put(job_done)
        end_callback(ret) # "Returns" the result of the main call

    # Starts fmin in a new thread
    start_new_thread(task,())

    # Consumer
    while True:
        next_item = q.get(True,timeout) # Blocks until an input is available
        if next_item is job_done:
            break
        yield next_item

更新:阻止执行下一次迭代,直到消费者处理完最后一次迭代,还需要使用task_donejoin

    # Producer
    def my_callback(x):
        q.put(x)
        q.join() # Blocks until task_done is called

    # Consumer
    while True:
        next_item = q.get(True,timeout) # Blocks until an input is available
        if next_item is job_done:
            break
        yield next_item
        q.task_done() # Unblocks the producer, so a new iteration can start

请注意,maxsize=1不是必需的,因为在最后一个消耗之前,不会有新项目添加到队列中。

更新2:另请注意,除非此生成器最终检索到所有项目,否则创建的线程将死锁(它将永久阻塞,其资源永远不会被释放)。生产者正在等待队列,并且由于它存储了对该队列的引用,即使消费者是gc,它也永远不会被gc回收。然后队列将无法访问,因此没有人能够释放锁。

如果可能的话,一个干净的解决方案是未知的(因为它取决于fmin所用的特定功能)。可以使用timeout进行解决方法,如果put阻塞太长时间,则生产者会引发异常:

    q = Queue(maxsize=1)

    # Producer
    def my_callback(x):
        q.put(x)
        q.put("dummy",True,timeout) # Blocks until the first result is retrieved
        q.join() # Blocks again until task_done is called

    # Consumer
    while True:
        next_item = q.get(True,timeout) # Blocks until an input is available
        q.task_done()                   # (one "task_done" per "get")
        if next_item is job_done:
            break
        yield next_item
        q.get() # Retrieves the "dummy" object (must be after yield)
        q.task_done() # Unblocks the producer, so a new iteration can start

答案 1 :(得分:6)

  

概念使用maxsize=1阻止队列和生产者/消费者模型。

回调产生,然后对回调的下一次调用将阻塞整个队列。

然后,消费者从队列中获取值,尝试获取另一个值,并在读取时阻塞。

生产者被允许推入队列,冲洗并重复。

用法:

def dummy(func, arg, callback=None):
  for i in range(100):
    callback(func(arg+i))

# Dummy example:
for i in Iteratorize(dummy, lambda x: x+1, 0):
  print(i)

# example with scipy:
for i in Iteratorize(scipy.optimize.fmin, func, x0):
   print(i)

可以按预期用于迭代器:

for i in take(5, Iteratorize(dummy, lambda x: x+1, 0)):
  print(i)

迭代课程:

from thread import start_new_thread
from Queue import Queue

class Iteratorize:
  """ 
  Transforms a function that takes a callback 
  into a lazy iterator (generator).
  """
  def __init__(self, func, ifunc, arg, callback=None):
    self.mfunc=func
    self.ifunc=ifunc
    self.c_callback=callback
    self.q = Queue(maxsize=1)
    self.stored_arg=arg
    self.sentinel = object()

    def _callback(val):
      self.q.put(val)

    def gentask():
      ret = self.mfunc(self.ifunc, self.stored_arg, callback=_callback)
      self.q.put(self.sentinel)
      if self.c_callback:
        self.c_callback(ret)

    start_new_thread(gentask, ())

  def __iter__(self):
    return self

  def next(self):
    obj = self.q.get(True,None)
    if obj is self.sentinel:
     raise StopIteration 
    else:
      return obj

对于要包装的函数和/或最终结果回调,可能会进行一些清理以接受*args**kwargs

答案 2 :(得分:5)

生成器作为协程(无线程)

让我们FakeFtp使用retrbinary函数,并在每次成功读取数据块时调用回调:

class FakeFtp(object):
    def __init__(self):
        self.data = iter(["aaa", "bbb", "ccc", "ddd"])

    def login(self, user, password):
        self.user = user
        self.password = password

    def retrbinary(self, cmd, cb):
        for chunk in self.data:
            cb(chunk)

使用简单的回调函数有缺点,即重复调用它和回调 函数不能轻易地保持调用之间的上下文。

以下代码定义了process_chunks生成器,它将能够接收一个数据块 一个并处理它们。与简单的回调相比,这里我们能够保留所有 在一个函数内处理而不会丢失上下文。

from contextlib import closing
from itertools import count


def main():
    processed = []

    def process_chunks():
        for i in count():
            try:
                # (repeatedly) get the chunk to process
                chunk = yield
            except GeneratorExit:
                # finish_up
                print("Finishing up.")
                return
            else:
                # Here process the chunk as you like
                print("inside coroutine, processing chunk:", i, chunk)
                product = "processed({i}): {chunk}".format(i=i, chunk=chunk)
                processed.append(product)

    with closing(process_chunks()) as coroutine:
        # Get the coroutine to the first yield
        coroutine.next()
        ftp = FakeFtp()
        # next line repeatedly calls `coroutine.send(data)`
        ftp.retrbinary("RETR binary", cb=coroutine.send)
        # each callback "jumps" to `yield` line in `process_chunks`

    print("processed result", processed)
    print("DONE")

要查看代码的实际效果,请将FakeFtp类,上面显示的代码和以下行:

main()

进入一个文件并调用它:

$ python headsandtails.py
('inside coroutine, processing chunk:', 0, 'aaa')
('inside coroutine, processing chunk:', 1, 'bbb')
('inside coroutine, processing chunk:', 2, 'ccc')
('inside coroutine, processing chunk:', 3, 'ddd')
Finishing up.
('processed result', ['processed(0): aaa', 'processed(1): bbb', 'processed(2): ccc', 'processed(3): ddd'])
DONE

如何运作

processed = []就在这里,生成器process_chunks应该没有问题 与其外部环境合作。所有内容都包含在def main():中以证明,没有必要 使用全局变量。

def process_chunks()是解决方案的核心。它可能有一个输入参数(不是 这里使用的),但主要的是,它接收输入的是每条yield行返回任何人发送的内容 通过.send(data)进入此生成器的实例。一个人可以coroutine.send(chunk),但在此示例中,它是通过回调引用此函数callback.send完成的。

请注意,在实际解决方案中,代码中有多个yield是没有问题的,它们是 逐个处理。这可以用于例如读取(并忽略)CSV文件的标题然后 继续使用数据处理记录。

我们可以按如下方式实例化和使用生成器:

coroutine = process_chunks()
# Get the coroutine to the first yield
coroutine.next()

ftp = FakeFtp()
# next line repeatedly calls `coroutine.send(data)`
ftp.retrbinary("RETR binary", cb=coroutine.send)
# each callback "jumps" to `yield` line in `process_chunks`

# close the coroutine (will throw the `GeneratorExit` exception into the
# `process_chunks` coroutine).
coroutine.close()

真实代码正在使用contextlib closing上下文管理器来确保coroutine.close() 总是叫。

结论

此解决方案不提供迭代器来消费来自传统风格的数据 外&#34 ;.另一方面,我们能够:

  • 使用发电机"从内部"
  • 将所有迭代处理保留在一个函数中,而不会在回调之间中断
  • 可选择使用外部上下文
  • 向外部提供可用的结果
  • 所有这一切都可以在不使用线程的情况下完成

积分:该解决方案受到 user2357112

撰写的SO回复Python FTP “chunk” iterator (without loading entire file into memory) 的启发

答案 3 :(得分:1)

更Python化的方式

使用threadingqueue的解决方案是好的,但不是pythonic的,这是使用subprocess的一种更好的方法(至少对我来说〜):

import pickle
import scipy
import select
import subprocess

def my_fmin(func, x0):
    # open a process to use as a pipeline
    proc = subprocess.Popen(['cat'], stdin=subprocess.PIPE, stdout=subprocess.PIPE)

    def my_callback(x):
        # x might be any object, not only str, so we use pickle to dump it
        proc.stdin.write(pickle.dumps(x) + '\n')

    scipy.optimize.fmin(func, x0, callback=my_callback)

    # use select in case that the callback is asynchronous;
    # otherwise, you can simply close proc.stdin and iterate over proc.stdout
    while select.select([proc.stdout], [], [], 0)[0]:
        yield pickle.loads(proc.stdout.readline()[:-1])

    # close the process
    proc.communicate()

然后您可以使用如下功能:

for x in my_fmin(func, x0):
    print x

答案 4 :(得分:0)

怎么样

data = []
scipy.optimize.fmin(func,x0,callback=data.append)
for line in data:
    print line

如果没有,您究竟想对发电机的数据做什么?

答案 5 :(得分:0)

弗里茨回答​​的另一种形式,即:

  • 支持send为回调选择返回值
  • 支持throw为回调选择异常
  • 支持close正常关闭
  • 在请求之前不计算队列项目

包含测试的完整代码,on github

import queue
import threading
import collections.abc

class generator_from_callback(collections.abc.Generator):
    def __init__(self, expr):
        """
        expr: a function that takes a callback
        """ 
        self._expr = expr
        self._done = False
        self._ready_queue = queue.Queue(1)
        self._done_queue = queue.Queue(1)
        self._done_holder = [False]

        # local to avoid reference cycles
        ready_queue = self._ready_queue
        done_queue = self._done_queue
        done_holder = self._done_holder

        def callback(value):
            done_queue.put((False, value))
            cmd, *args = ready_queue.get()
            if cmd == 'close':
                raise GeneratorExit
            elif cmd == 'send':
                return args[0]
            elif cmd == 'throw':
                raise args[0]

        def thread_func():
            try:
                cmd, *args = ready_queue.get()
                if cmd == 'close':
                    raise GeneratorExit
                elif cmd == 'send':
                    if args[0] is not None:
                        raise TypeError("can't send non-None value to a just-started generator")
                elif cmd == 'throw':
                    raise args[0]
                ret = expr(callback)
                raise StopIteration(ret)
            except BaseException as e:
                done_holder[0] = True
                done_queue.put((True, e))
        self._thread = threading.Thread(target=thread_func)
        self._thread.start()

    def __next__(self):
        return self.send(None)

    def send(self, value):
        if self._done_holder[0]:
            raise StopIteration
        self._ready_queue.put(('send', value))
        is_exception, val = self._done_queue.get()
        if is_exception:
            raise val
        else:
            return val

    def throw(self, exc):
        if self._done_holder[0]:
            raise StopIteration
        self._ready_queue.put(('throw', exc))
        is_exception, val = self._done_queue.get()
        if is_exception:
            raise val
        else:
            return val

    def close(self):
        if not self._done_holder[0]:
            self._ready_queue.put(('close',))
        self._thread.join()

    def __del__(self):
        self.close()

工作方式:

In [3]: def callback(f):
   ...:     ret = f(1)
   ...:     print("gave 1, got {}".format(ret))
   ...:     f(2)
   ...:     print("gave 2")
   ...:     f(3)
   ...:

In [4]: i = generator_from_callback(callback)

In [5]: next(i)
Out[5]: 1

In [6]: i.send(4)
gave 1, got 4
Out[6]: 2

In [7]: next(i)
gave 2, got None
Out[7]: 3

In [8]: next(i)
StopIteration

对于scipy.optimize.fmin,您将使用generator_from_callback(lambda c: scipy.optimize.fmin(func, x0, callback=c))