Question

这是我正在使用的更大代码的MWE。它对KDE（kernel density estimate）上的所有值执行蒙特卡洛积分（Integrate 2D kernel density estimate），对于列表中的多个点，迭代地针对此问题提出了积分方法：Speed up sampling of kernel estimate并返回由这些结果组成的列表。

import numpy as np
from scipy import stats
from multiprocessing import Pool
import threading

# Define KDE integration function.
def kde_integration(m_list):

    # Put some of the values from the m_list into two new lists.
    m1, m2 = [], []
    for item in m_list:
        # x data.
        m1.append(item[0])
        # y data.
        m2.append(item[1])

    # Define limits.
    xmin, xmax = min(m1), max(m1)
    ymin, ymax = min(m2), max(m2)

    # Perform a kernel density estimate on the data:
    x, y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
    values = np.vstack([m1, m2])
    kernel = stats.gaussian_kde(values)

    # This list will be returned at the end of this function.
    out_list = []

    # Iterate through all points in the list and calculate for each the integral
    # of the KDE for the domain of points located below the value of that point
    # in the KDE.
    for point in m_list:

        # Compute the point below which to integrate.
        iso = kernel((point[0], point[1]))

        # Sample KDE distribution
        sample = kernel.resample(size=1000)

        #Choose number of cores and split input array.
        cores = 4
        torun = np.array_split(sample, cores, axis=1)

        # Print number of active threads.
        print threading.active_count()

        #Calculate
        pool = Pool(processes=cores)
        results = pool.map(kernel, torun)

        #Reintegrate and calculate results
        insample_mp = np.concatenate(results) < iso

        # Integrate for all values below iso.
        integral = insample_mp.sum() / float(insample_mp.shape[0])

        # Append integral value for this point to list that will return.
        out_list.append(integral)

    return out_list


# Generate some random two-dimensional data:
def measure(n):
    "Measurement model, return two coupled measurements."
    m1 = np.random.normal(size=n)
    m2 = np.random.normal(scale=0.5, size=n)
    return m1+m2, m1-m2

# Create list to pass to KDE integral function.
m_list = []
for i in range(100):
    m1, m2 = measure(5)
    m_list.append(m1.tolist())
    m_list.append(m2.tolist())

# Call KDE integration function.
print 'Integral result: ', kde_integration(m_list)

代码中multiprocessing已在此问题{{3}}上提出，以加快代码速度（最高可达3.4倍）。

代码工作正常，直到我尝试将大于62-63个元素的列表传递给KDE函数（即：我在行for i in range(100)中设置了一个超过63的值）如果我这样做，我得到以下错误：

Traceback (most recent call last):
  File "~/gauss_kde_temp.py", line 78, in <module>
    print 'Integral result: ', kde_integration(m_list)
  File "~/gauss_kde_temp.py", line 48, in kde_integration
    pool = Pool(processes=cores)
  File "/usr/lib/python2.7/multiprocessing/__init__.py", line 232, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
  File "/usr/lib/python2.7/multiprocessing/pool.py", line 144, in __init__
    self._worker_handler.start()
  File "/usr/lib/python2.7/threading.py", line 494, in start
    _start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread

通常（10次中有9次）围绕活动线程374。在python编码方面，我方式在我的联盟之外，我不知道如何解决这个问题。任何帮助将不胜感激。

添加

我尝试添加while循环以防止代码使用太多线程。我所做的是用这段代码替换print threading.active_count()行：

    # Print number of active threads.
    exit_loop = True
    while exit_loop:
        if threading.active_count() < 300:
            exit_loop = False
        else:
            # Pause for 10 seconds.
            time.sleep(10.)
            print 'waiting: ', threading.active_count()

当代码到达302活动线程时，代码暂停（即：卡在循环内）。我等了10多分钟，代码从未退出循环，活动线程数从未从302下降。一段时间后活动线程的数量不应该减少吗？

线程错误：无法启动新线程

添加

0 个答案: