Question

我有一个代码，它使用CSV文件中的图像URL列表，然后对这些图像执行面部检测，然后加载某些模型并对这些图像进行预测。

我做了一些负载测试，发现代码中的 get_face 函数占用了产生结果所需时间的很大一部分，并且为预测创建的pickle文件花费了额外的时间。

问题：是否有可能通过在线程中运行这些进程，可以减少时间以及如何以多线程方式完成此操作？

以下是代码示例：

from __future__ import division
import numpy as np

from multiprocessing import Process, Queue, Pool
import os
import pickle
import pandas as pd
import dlib
from skimage import io
from skimage.transform import resize

df = pd.read_csv('/home/instaurls.csv')
detector = dlib.get_frontal_face_detector()
img_width, img_height = 139, 139
confidence = 0.8

def get_face():
    output = None
    data1 = []
    for row in df.itertuples():
        img = io.imread(row[1])
        dets = detector(img, 1)
        for i, d in enumerate(dets):
            img = img[d.top():d.bottom(), d.left():d.right()]
            img = resize(img, (img_width, img_height))
            output = np.expand_dims(img, axis=0)
            break
        data1.append(output)
    data1 = np.concatenate(data1)
    return data1

get_face()

csv示例

data
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/17883193_940000882769400_8455736118338387968_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/22427207_1737576603205281_7879421442167668736_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/12976287_1720757518213286_1180118177_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/16788491_748497378632253_566270225134125056_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/21819738_128551217878233_9151523109507956736_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/14295447_318848895135407_524281974_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/18160229_445050155844926_2783054824017494016_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/17883193_940000882769400_8455736118338387968_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/22427207_1737576603205281_7879421442167668736_n.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/12976287_1720757518213286_1180118177_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/16788491_748497378632253_566270225134125056_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/21819738_128551217878233_9151523109507956736_n.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/14295447_318848895135407_524281974_a.jpg
https://scontent-frx5-1.cdninstagram.com/t51.2885-19/s320x320/18160229_445050155844926_2783054824017494016_a.jpg
https://scontent-frt3-2.cdninstagram.com/t51.2885-19/s320x320/23101834_1502115223199537_1230866541029883904_n.jpg

Answer 1

以下是您可以尝试并行执行此操作的方法：

from __future__ import division
import numpy as np

from multiprocessing import Process, Queue, Pool
import os
import pickle
import pandas as pd
import dlib
from skimage import io
from skimage.transform import resize
from csv import DictReader

df = DictReader(open('/home/instaurls.csv')) # DictReader is iterable
detector = dlib.get_frontal_face_detector() 
img_width, img_height = 139, 139
confidence = 0.8

def get_face(row):
    """
    Here row is dictionary where keys are CSV header names
    and values are values from current CSV row.
    """
    output = None

    img = io.imread(row[1]) # row[1] has to be changed to row['data']?
    dets = detector(img, 1)
    for i, d in enumerate(dets):
        img = img[d.top():d.bottom(), d.left():d.right()]
        img = resize(img, (img_width, img_height))
        output = np.expand_dims(img, axis=0)
        break

    return output

if __name__ == '__main__':
    pool = Pool() # default to number CPU cores
    data = list(pool.imap(get_face, df))
    print np.concatenate(data)

注意get_face及其所具有的论点。而且，它返回的是什么。当我说较小的工作时，这就是我的意思。现在get_face处理CSV中的一行。

运行此脚本时，pool将引用Pool的实例，然后为get_face中的每一行/元组调用df.itertuples()。< / p>

完成所有操作后，data会保留处理数据，然后您会对其进行np.concatenate。

如何使用多线程优化人脸检测？

1 个答案: