Question

我在大数据阵列上运行pool.map，我想每分钟在控制台中打印报告。可能吗？据我所知，python是同步语言，它不能像nodejs那样做。

也许可以通过线程......或者如何完成？

finished = 0

def make_job():
   sleep(1)
   global finished
   finished += 1

# I want to call this function every minute
def display_status():
   print 'finished: ' + finished

def main():
    data = [...]
    pool = ThreadPool(45)
    results = pool.map(make_job, data)
    pool.close()
    pool.join()

Answer 1

您可以使用永久线程计时器，例如此问题的Python threading.timer - repeat function every 'n' seconds

from threading import Timer,Event 

class perpetualTimer(object):

   # give it a cycle time (t) and a callback (hFunction) 
   def __init__(self,t,hFunction):
      self.t=t
      self.stop = Event()
      self.hFunction = hFunction
      self.thread = Timer(self.t,self.handle_function)

   def handle_function(self):
      self.hFunction()
      self.thread = Timer(self.t,self.handle_function)
      if not self.stop.is_set():
          self.thread.start()

   def start(self):
      self.stop.clear()
      self.thread.start()

   def cancel(self):
      self.stop.set()
      self.thread.cancel()

基本上这只是Timer对象的包装器，每次调用所需的函数时都会创建一个新的Timer对象。不要期望毫秒级的精确度（甚至接近），但为了您的目的，它应该是理想的。

使用此示例将成为：

finished = 0

def make_job():
   sleep(1)
   global finished
   finished += 1

def display_status():
   print 'finished: ' + finished

def main():
    data = [...]
    pool = ThreadPool(45)

    # set up the monitor to make run the function every minute
    monitor = PerpetualTimer(60,display_status)
    monitor.start()
    results = pool.map(make_job, data)
    pool.close()
    pool.join()
    monitor.cancel()

修改：

更清洁的解决方案可能是（感谢下面的评论）：

from threading import Event,Thread class RepeatTimer(Thread): def __init__(self, t, callback, event): Thread.__init__(self) self.stop = event self.wait_time = t self.callback = callback self.daemon = True def run(self): while not self.stop.wait(self.wait_time): self.callback()

然后在你的代码中：

def main(): data = [...] pool = ThreadPool(45) stop_flag = Event() RepeatTimer(60,display_status,stop_flag).start() results = pool.map(make_job, data) pool.close() pool.join() stop_flag.set()

Answer 2

实现此目的的一种方法是使用主线程作为监控线程。像下面这样的东西应该有效：

def main():
   data = [...]
   results = []
   step = 0
   pool = ThreadPool(16)
   pool.map_async(make_job, data, callback=results.extend)
   pool.close()
   while True:
      if results:
          break
      step += 1
      sleep(1)
      if step % 60 == 0:
          print "status update" + ...

我使用.map()代替.map_async()，因为前者是同步的。此外，您可能需要用更有效的方式替换results.extend。最后，由于GIL，速度的提升可能比预期的要小得多。

顺便说一下，你写一篇关于ThreadPool问题的Python是同步的，这有点好笑。）。

Answer 3

考虑使用time模块。 time.time()函数返回当前UNIX time。

例如，立即调用time.time()会返回1410384038.967499。一秒钟之后，它将返回1410384039.967499。

我这样做的方法是在results = pool(...)的位置使用 while 循环，并在每次迭代时运行这样的检查：

last_time = time.time()
while (...):
    new_time = time.time()
    if new_time > last_time+60:
        print "status update" + ...
        last_time = new_time
    (your computation here)

这样可以检查自上次状态更新后是否（至少）过了一分钟。它应该大约每六十秒打印一次状态更新。

很抱歉，这是一个不完整的答案，但我希望这有助于或为您提供一些有用的想法。

在执行pool.map时，是否可以在python中每x秒执行一次函数？

3 个答案: