Question

我正在编写一个用于多个信号基线校正的代码。代码的结构是这样的。

# for each file in a directory
    #read file and populate X vector
    temp = baseline_als(x,1000,0.00001)
    plt.plot(x-temp)
    plt.savefig("newbaseline.png")
    plt.close()

baseline_als功能如下。

def baseline_als(y, lam, p, niter=20):
        L = len(y)
        D = sparse.csc_matrix(np.diff(np.eye(L), 2))
        w = np.ones(L)
        for i in xrange(niter):
            W = sparse.spdiags(w, 0, L, L)
            Z = W + lam * D.dot(D.transpose())
            z = spsolve(Z, w*y)
            w = p * (y > z) + (1-p) * (y < z)
        return z

现在，当我在目录中放置大约100个文件时，代码工作正常，但由于复杂性非常高，需要时间。但是当我的目录中有大约10000个文件然后运行此脚本时，系统会在几分钟后冻结。我不介意延迟执行，但无论如何脚本应该完成执行吗？

Answer 1

当您在太多文件上运行时，脚本中的

消耗了太多RAM，请参阅Why does a simple python script crash my system

程序运行的过程在进程内存中存储用于计算的数组和变量，这是ram并且它们在那里累积

可能的解决方法是在子进程中运行baseline_als()函数。当孩子返回时，内存会自动释放，请参阅Releasing memory in Python

在子进程中执行函数：

from multiprocessing import Process, Queue

def my_function(q, x):
 q.put(x + 100)

if __name__ == '__main__':
 queue = Queue()
 p = Process(target=my_function, args=(queue, 1))
 p.start()
 p.join() # this blocks until the process terminates
 result = queue.get()
 print result

复制自：Is it possible to run function in a subprocess without threading or writing a separate file/script

这样可以防止ram被进程（程序）生成的未引用旧变量消耗

另一种可能性是调用垃圾收集器gc.collect()但不建议这样做（在某些情况下不起作用）

更有用的链接：

memory usage, how to free memory

Python large variable RAM usage

I need to free up RAM by storing a Python dictionary on the hard drive, not in RAM. Is it possible?

Answer 2

我能够阻止我的CPU达到100％然后使用time.sleep(0.02)冻结。这需要很长时间，但仍然可以完成执行。

请注意，在使用此功能之前，您需要import time。

系统在运行python脚本时冻结

2 个答案: