Question

我正在构建一个Python脚本/应用程序，它会启动多个所谓的Fetchers。他们反过来做某事并将数据返回队列。

我想确保Fetchers运行超过60秒（因为整个应用程序在一小时内运行多次）。

阅读Python文档我注意到他们在使用Process.Terminate（）时要小心，因为它可能会破坏队列。

我目前的代码：

# Result Queue
resultQueue = Queue();

# Create Fetcher Instance
fetcher = fetcherClass()

# Create Fetcher Process List
fetcherProcesses = []

# Run Fetchers
for config in configList:
    # Create Process to encapsulate Fetcher
    log.debug("Creating Fetcher for Target: %s" % config['object_name'])
    fetcherProcess = Process(target=fetcher.Run, args=(config,resultQueue))

    log.debug("Starting Fetcher for Target: %s" % config['object_name'])
    fetcherProcess.start()
    fetcherProcesses.append((config, fetcherProcess))

# Wait for all Workers to complete
for config, fetcherProcess in fetcherProcesses:
    log.debug("Waiting for Thread to complete (%s)." % str(config['object_name']))
    fetcherProcess.join(DEFAULT_FETCHER_TIMEOUT)
    if fetcherProcess.is_alive():
        log.critical("Fetcher thread for object %s Timed Out! Terminating..." % config['object_name'])
        fetcherProcess.terminate()

# Loop thru results, and save them in RRD
while not resultQueue.empty():
    config, fetcherResult = resultQueue.get()
    result = storage.Save(config, fetcherResult)

我想确保当我的一个Fetchers超时时我的队列没有被破坏。

这样做的最佳方式是什么？

编辑：回应与sebdelsol聊天时的一些澄清：

1）我想尽快开始处理数据，否则我必须同时执行大量的磁盘密集型操作。因此，睡眠X_Timeout的主线程不是一种选择。

2）我需要等待Timeout只有一次，但是每个进程，所以如果主线程启动50个fetchers，这需要几秒到半分钟，我需要补偿。

3）我想确定来自Queue.Get（）的数据由没有超时的Fetcher放在那里（因为理论上可能是一个fetcher是将数据放入队列中，当超时发生时，它被枪杀......）该数据应该被转储。

当发生超时时，这不是一件非常糟糕的事情，这不是一个理想的情况，但是腐败的数据更糟糕。

Answer 1

您可以将新的multiprocessing.Lock()传递给您开始的每个抓手。

在抓取程序的过程中，请务必使用此锁包装Queue.put()：

with self.lock:
    self.queue.put(result)

当您需要终止提取程序的进程时，请使用其锁定：

with fetcherLock:
    fetcherProcess.terminate()

这样，您的队列在队列访问期间通过杀死一个fetcher而不会被破坏。

某些抓手的锁可能会被破坏。但是，这不是问题，因为你推出的每个新推文都有一个全新的锁定。

Answer 2

为什么不

创建一个新队列并启动将使用它的所有提取程序队列中。
让你的脚本实际上睡觉你希望fetcher的进程获得结果所需的时间。
从resultQueue获取所有内容 - 它不会被破坏，因为你没有必要杀死任何进程。
最后，终止所有仍然活着的提取者进程。
loop！

使用Process.Terminate（）时如何解决队列损坏

2 个答案: