在python中有效处理阻塞操作

时间:2019-02-05 23:48:24

标签: python multithreading asynchronous python-multiprocessing blocking

我正在使用python和OpenCV从rtsp流中获取视频。我从流中获取单个帧并将其保存到文件系统。

我写了一个StreamingWorker来处理帧的获取和保存。另外,还有一个StreamPool,其中包含所有流对象。我认为StreamingWorker始终会运行,因此每个内核应该只有一个,以便占用更多资源。然后StreamPoolVideoCapture对象提供给可用的StreamingWorker

问题在于该脚本在大多数情况下正在阻塞:

import os
import time
import threading
import cv2 as cv

class StreamingWorker(object):

    def __init__(self, stream_pool):
        self.stream_pool = stream_pool
        self.start_loop()

    def start_loop(self):
        while True:
            try:
                # getting a stream from the read_strategy
                stream_object = self.stream_pool.next()

                # getting an image from the stream
                _, frame = stream_object['stream'].read()

                # saving image to file system
                cv.imwrite(os.path.join('result', stream_object['feed'], '{}.jpg'.format(time.time())))

            except ValueError as e:
                print('[error] {}'.format(e))

class StreamPool(object):

    def __init__(self, streams):
        self.streams = [{'feed': stream, 'stream': cv.VideoCapture(stream)} for stream in streams]
        self.current_stream = 0
        self.lock = threading.RLock()

    def next(self):
        self.lock.acquire()
        if(self.current_stream + 1 >= len(self.streams)):
            self.current_stream = 0
        else:
            self.current_stream += 1
        result = self.streams[self.current_stream]
        self.lock.release()
        return result

def get_cores():
    # This function returns the number of available cores
    import multiprocessing
    return multiprocessing.cpu_count()


def start(stream_pool):
    StreamingWorker(stream_pool)

def divide_list(input_list, amount):
    # This function divides the whole list into list of lists
    result = [[] for _ in range(amount)]
    for i in range(len(input_list)):
        result[i % len(result)].append(input_list[i])
    return result

if __name__ == '__main__':

    stream_list = ['rtsp://some/stream1', 'rtsp://some/stream2', 'rtsp://some/stream3']

    num_cores = get_cores()
    divided_streams = divide_list(stream_list, num_cores)
    for streams in divided_streams:
        stream_pool = StreamPool(streams)
        thread = threading.Thread(target=start, args=(stream_pool))
        thread.start()

当我想到这一点时,我没有考虑到大多数操作都会阻塞以下操作:

# Getting a frame blocks
_, frame = stream_object['stream'].read()

# Writing to the file system blocks
cv.imwrite(os.path.join('result', stream_object['feed'], '{}.jpg'.format(time.time())))

花费太多时间进行阻塞的问题是大多数处理能力被浪费了。我曾考虑过将期货与ThreadPoolExecutor一起使用,但似乎无法达到使用最大数量的处理核心的目标。也许我没有设置enaugh线程。

是否存在一种标准的处理阻塞操作的方法,以充分利用内核的处理能力?我可以接受与语言无关的答案。

1 个答案:

答案 0 :(得分:0)

我最终通过ThreadPoolExecutor函数使用了add_done_callback(fn)

class StreamingWorker(object):

    def __init__(self, stream_pool):
        self.stream_pool = stream_pool
        self.thread_pool = ThreadPoolExecutor(10)
        self.start_loop()

    def start_loop(self):
        def done(fn):
            print('[info] future done')

        def save_image(stream):
            # getting an image from the stream
            _, frame = stream['stream'].read()

            # saving image to file system
            cv.imwrite(os.path.join('result', stream['feed'], '{}.jpg'.format(time.time())))

        while True:
            try:
                # getting a stream from the read_strategy
                stream_object = self.stream_pool.next()

                # Scheduling the process to the thread pool
                self.thread_pool.submit(save_image, (stream_object)).add_done_callback(done)
            except ValueError as e:
                print('[error] {}'.format(e))

在将来完成之后,我实际上并不想做任何事情,但是如果我使用result(),那么while True将会停止,这也将使使用线程池的所有目的失效。 / p>

旁注::在调用threading.Rlock()时,我不得不添加self.stream_pool.next(),因为显然opencv无法处理来自多个线程的调用。