Question

在我的python代码中，有一个For循环，该循环从大量文件名中读取文件，从这些文件中读取信息，然后将该信息写入Numpy.ndarray。我意识到此for循环需要花费大量时间才能完成，我可以使用Multiprocessing.Pool()

并行化此过程，从而节省时间

这是我要并行化的for循环，看起来像这样（实际代码不同）

Matrix = [[0,0,0],[0,0,0],......[0,0,0]]
#a 2D numpy array of Zeros, this is where we want to write information to.

FileList = [file1,file2,file3....,fileN]
# a list containing the file names

for index in range (0,len(FileList)) :

   Data = ReadDataFromFile(FileList[index])
   #read some information from the file to the variable Data

   Matrix[M][N] = Data 
   # the value of Data is written to the MNth element of the matrix

我试图并行化它，而不是系统一次读取一个文件，而是希望它并行读取尽可能多的文件。

我找不到并行化“ For循环”的方法，因此通过遵循Stackoverflow中的一些示例，我以函数的形式编写了For循环，然后从multiprocessing.Pool.map()开始使用。

该函数以文件名作为输入，如上所述从文件中读取信息，然后将此信息写入已定义的Numpy.ndarray。在函数内部，我将数组导入为{{1} }，以便在函数内部进行的修改将在函数外部可用。

global

被调用时，该函数运行良好，并且正在从文件向阵列写入信息。

但是当我尝试使用def GetDataFromFile(filename) : global Matrix #calling Matrix as global variable Data = ReadData(filename) Matrix[M][N] #writing information to global Matrix

并行化该过程时

它没有按预期方式工作，也就是说，函数“ GetDataFromFile”所做的修改未全局更改ndarray的值。

multiprocessing. Pool.map()

以上代码的输出为全零，即当与import multiprocessing p = multiprocessing.Pool() Matrix = [[0,0,0],[0,0,0],......[0,0,0]] #a 2D numpy array of Zeros, this is where we want to write information to. FileList = [file1,file2,file3....,fileN] def GetDataFromFile(filename) : global Matrix #calling Matrix as global variable Data = ReadData(filename) Matrix[M][N] #writing information to global Matrix p.map(GetDataFromFile,FileList) print Matrix一起使用时，函数未向ndarray添加信息。

这是什么问题？我们该如何解决？有没有其他方法可以达到相同的目的？

预先感谢，我在Ubuntu 16.04 LTS上使用Python2.7。

python中的多处理池问题

0 个答案: