Python pool.map函数完成但留下僵尸

时间:2019-11-15 15:48:20

标签: python python-3.x multiprocessing

我一直遇到一个问题,即使在调用pool.terminate之后,pool.map也会离开进程。我一直在寻找解决方案,但是它们似乎都存在其他一些问题,例如递归调用map函数或其他会干扰多处理的进程。

因此,我的代码导入了2个NETCDF文件,并使用不同的计算方法处理了其中的数据。这些占用了大量时间(几个6400x6400阵列),因此我尝试对代码进行多处理。多重处理工作正常,我第一次运行代码需要2.5分钟(从8分钟开始减少),但是每次我的代码完成运行时,Spyder的内存使用量都不会减少,并且会在Windows任务管理器中留下额外的python进程。我的代码如下:

import numpy as np
import netCDF4
import math
from math import sin, cos
import logging
from multiprocessing.pool import Pool
import time

start=time.time()
format = "%(asctime)s: %(message)s"
logging.basicConfig(format=format, level=logging.INFO, datefmt="%H:%M:%S")
logging.info("Here we go!")
path = "DATAPATH"
geopath = "DATAPATH"
f = netCDF4.Dataset(path)
f.set_auto_maskandscale(False)

f2 = netCDF4.Dataset(geopath)
i5lut=f.groups['observation_data'].variables['I05_brightness_temperature_lut'][:]
i4lut=f.groups['observation_data'].variables['I05_brightness_temperature_lut'][:]
I5= f.groups['observation_data'].variables['I05'][:]
I4= f.groups['observation_data'].variables['I04'][:]
I5=i5lut[I5]
I4=i4lut[I4]
I4Quality= f.groups['observation_data'].variables['I04_quality_flags'][:]
I5Quality= f.groups['observation_data'].variables['I05_quality_flags'][:]
I3= f.groups['observation_data'].variables['I03']
I2= f.groups['observation_data'].variables['I02']
I1= f.groups['observation_data'].variables['I01']
I1.set_auto_scale(True)
I2.set_auto_scale(True)
I3.set_auto_scale(True)
I1=I1[:]
I2=I2[:]
I3=I3[:]

lats = f2.groups['geolocation_data'].variables['latitude'][:]
lons = f2.groups['geolocation_data'].variables['longitude'][:]
solarZen = f2.groups['geolocation_data'].variables['solar_zenith'][:]
sensorZen= solarZen = f2.groups['geolocation_data'].variables['sensor_zenith'][:]
solarAz = f2.groups['geolocation_data'].variables['solar_azimuth'][:]
sensorAz= solarZen = f2.groups['geolocation_data'].variables['sensor_azimuth'][:]
def kernMe(i, j, band):
    if i<250 or j<250:
        return -1
    else:
        return np.mean(band[i-250:i+250:1,j-250:j+250:1])
def thread_me(arr):
    start1=arr[0]
    end1=arr[1]
    start2=arr[2]
    end2=arr[3]
    logging.info("Im starting at: %d to %d, %d to %d" %(start1, end1, start2, end2))
    points = []
    avg = np.mean(I4)
    for i in range(start1,end1):
        for j in range (start2,end2):

            if solarZen[i,j]>=90:
                if not (I5[i,j]<265 and I4[i,j]<295):#
                    if I4[i,j]>320 and I4Quality[i,j]==0:
                        points.append([lons[i,j],lats[i,j], 1])
                    elif I4[i,j]>300 and I5[i,j]-I4[i,j]>10:
                        points.append([lons[i,j],lats[i,j], 2])
                    elif I4[i,j] == 367 and I4Quality ==9:
                        points.append([lons[i,j],lats[i,j, 3]])
            else:

                if not ((I1[i,j]>I2[i,j]>I3[i,j]) or (I5[i,j]<265 or (I1[i,j]+I2[i,j]>0.9 and I5[i,j]<295) or 
                         (I1[i,j]+I2[i,j]>0.7 and I5[i,j]<285))):
                    if not (I1[i,j]+I2[i,j] > 0.6 and I5[i,j]<285 and I3[i,j]>0.3 and I3[i,j]>I2[i,j] and I2[i,j]>0.25 and I4[i,j]<=335):
                         thetaG= (cos(sensorZen[i,j]*(math.pi/180))*cos(solarZen[i,j]*(math.pi/180)))-(sin(sensorZen[i,j]*(math.pi/180))*sin(solarZen[i,j]*(math.pi/180))*cos(sensorAz[i,j]*(math.pi/180)))
                         thetaG= math.acos(thetaG)*(180/math.pi)
                         if not ((thetaG<15 and I1[i,j]+I2[i,j]>0.35) or (thetaG<25 and I1[i,j]+I2[i,j]>0.4)):
                                if math.floor(I4[i,j])==367 and I4Quality[i,j]==9 and I5>290 and I5Quality[i,j]==0 and (I1[i,j]+I2[i,j])>0.7:
                                    points.append([lons[i,j],lats[i,j, 4]])

                                elif I4[i,j]-I5[i,j]>25 or True:
                                    kern = kernMe(i, j, I4)
                                    if kern!=-1 or True:
                                        BT4M = max(325, kern)
                                        kern = min(330, BT4M)
                                        if I4[i,j]> kern and I4[i,j]>avg:
                                            points.append([lons[i,j],lats[i,j], 5])
    return points

if __name__ == '__main__':


    #Separate the arrays into 1616*1600 chunks for multi processing
    #TODO: make this automatic, not hardcoded
    arg=[[0,1616,0,1600],[0,1616,1600,3200],[0,1616,3200,4800],[0,1616,4800,6400],
    [1616,3232,0,1600],[1616,3232,1600,3200],[1616,3232,3200,4800],[1616,3232,4800,6400],
    [3232,4848,0,1600],[3232,4848,1600,3200],[3232,4848,3200,4800],[3232,4848,4800,6400],
    [4848,6464,0,1600],[4848,6464,1600,3200],[4848,6464,3200,4800],[4848,6464,4800,6400]]

    print(arg)
    p=Pool(processes = 4)
    output= p.map(thread_me, arg)
    p.close()
    p.join()

    print(output)

    f.close()
    f2.close()

    logging.info("Aaaand we're here!")
    print(str((time.time()-start)/60))
    p.terminate()

我同时使用p.close和p。终止是因为我认为这会有所帮助(没有帮助)。我所有的代码都运行并产生预期的输出,但我必须使用任务管理器手动结束徘徊的过程。关于的任何想法 是什么原因造成的?

我想我把所有相关信息都放在这里,如果您需要更多信息,我会根据要求进行编辑

谢谢。

0 个答案:

没有答案