这是我正在使用的更大代码的MWE
。它对KDE(kernel density estimate)上的所有值执行蒙特卡洛积分(Integrate 2D kernel density estimate),对于列表中的多个点,迭代地针对此问题提出了积分方法:Speed up sampling of kernel estimate并返回由这些结果组成的列表。
import numpy as np
from scipy import stats
from multiprocessing import Pool
import threading
# Define KDE integration function.
def kde_integration(m_list):
# Put some of the values from the m_list into two new lists.
m1, m2 = [], []
for item in m_list:
# x data.
m1.append(item[0])
# y data.
m2.append(item[1])
# Define limits.
xmin, xmax = min(m1), max(m1)
ymin, ymax = min(m2), max(m2)
# Perform a kernel density estimate on the data:
x, y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
values = np.vstack([m1, m2])
kernel = stats.gaussian_kde(values)
# This list will be returned at the end of this function.
out_list = []
# Iterate through all points in the list and calculate for each the integral
# of the KDE for the domain of points located below the value of that point
# in the KDE.
for point in m_list:
# Compute the point below which to integrate.
iso = kernel((point[0], point[1]))
# Sample KDE distribution
sample = kernel.resample(size=1000)
#Choose number of cores and split input array.
cores = 4
torun = np.array_split(sample, cores, axis=1)
# Print number of active threads.
print threading.active_count()
#Calculate
pool = Pool(processes=cores)
results = pool.map(kernel, torun)
#Reintegrate and calculate results
insample_mp = np.concatenate(results) < iso
# Integrate for all values below iso.
integral = insample_mp.sum() / float(insample_mp.shape[0])
# Append integral value for this point to list that will return.
out_list.append(integral)
return out_list
# Generate some random two-dimensional data:
def measure(n):
"Measurement model, return two coupled measurements."
m1 = np.random.normal(size=n)
m2 = np.random.normal(scale=0.5, size=n)
return m1+m2, m1-m2
# Create list to pass to KDE integral function.
m_list = []
for i in range(100):
m1, m2 = measure(5)
m_list.append(m1.tolist())
m_list.append(m2.tolist())
# Call KDE integration function.
print 'Integral result: ', kde_integration(m_list)
代码中multiprocessing
已在此问题{{3}}上提出,以加快代码速度(最高可达3.4倍)。
代码工作正常,直到我尝试将大于62-63个元素的列表传递给KDE函数(即:我在行for i in range(100)
中设置了一个超过63的值)如果我这样做,我得到以下错误:
Traceback (most recent call last):
File "~/gauss_kde_temp.py", line 78, in <module>
print 'Integral result: ', kde_integration(m_list)
File "~/gauss_kde_temp.py", line 48, in kde_integration
pool = Pool(processes=cores)
File "/usr/lib/python2.7/multiprocessing/__init__.py", line 232, in Pool
return Pool(processes, initializer, initargs, maxtasksperchild)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 144, in __init__
self._worker_handler.start()
File "/usr/lib/python2.7/threading.py", line 494, in start
_start_new_thread(self.__bootstrap, ())
thread.error: can't start new thread
通常(10次中有9次)围绕活动线程374
。在python
编码方面,我方式在我的联盟之外,我不知道如何解决这个问题。任何帮助将不胜感激。
我尝试添加while
循环以防止代码使用太多线程。我所做的是用这段代码替换print threading.active_count()
行:
# Print number of active threads.
exit_loop = True
while exit_loop:
if threading.active_count() < 300:
exit_loop = False
else:
# Pause for 10 seconds.
time.sleep(10.)
print 'waiting: ', threading.active_count()
当代码到达302
活动线程时,代码暂停(即:卡在循环内)。我等了10多分钟,代码从未退出循环,活动线程数从未从302
下降。一段时间后活动线程的数量不应该减少吗?