Question

我正在尝试编写简单的多线程python脚本：

from multiprocessing.dummy import Pool as ThreadPool

def resize_img_folder_multithreaded(img_fldr_src,img_fldr_dst,max_num_of_thread):

    images = glob.glob(img_fldr_src+'/*.'+img_file_extension)
    pool = ThreadPool(max_num_of_thread) 

    pool.starmap(resize_img,zip(images,itertools.repeat(img_fldr_dst)))
    # close the pool and wait for the work to finish 
    pool.close() 
    pool.join() 


def resize_img(img_path_src,img_fldr_dest):
    #print("about to resize image=",img_path_src)
    image = io.imread(img_path_src)         
    image = transform.resize(image, [300,300])
    io.imsave(os.path.join(img_fldr_dest,os.path.basename(img_path_src)),image)      
    label = img_path_src[:-4] + '.xml'
    if copyLabels is True and os.path.exists(label) is True :
        copyfile(label,os.path.join(img_fldr_dest,os.path.basename(label)))

将参数max_num_of_thread设置为[1 ... 10]中的任意数字根本没有改善我的运行时间（for 60 images it stays around 30 sec），max_num_of_thread = 10我的电脑卡住了

我的问题是：我的代码中的瓶颈是什么，为什么我看不到任何改进？

有关我的电脑的一些数据：

python -V
Python 3.6.4 :: Anaconda, Inc.


cat /proc/cpuinfo | grep 'processor' | wc -l
4

cat /proc/meminfo 
MemTotal:        8075960 kB
MemFree:         3943796 kB
MemAvailable:    4560308 kB

cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=17.10

Answer 1

您应该只对可用的cpu核心数使用多处理。您也没有使用队列，因此资源池正在执行相同的工作。您需要为代码添加队列。

Filling a queue and managing multiprocessing in python

Python：多线程不会改善运行时

1 个答案: