列表在使用线程时不附加值

时间:2018-04-09 19:43:38

标签: python multithreading list oop

我正在尝试编写一个类,用于搜索具有特定扩展名的计算机上的所有文件。为了使流程更快,我使用线程。它同时搜索所有硬盘。

我知道它在我打印print(file_path)

时会找到所有路径

但是这些值不会附加在self.ALLFILES中,我不知道为什么。

以下是代码:

from concurrent import futures
import time
import win32api
import os


class SearchThreader():
    def __init__(self):
        self.allfiles = []
        self.harddisks = win32api.GetLogicalDriveStrings().split('\000')[:-1]
        #skip the folders that shouldn't have files with this extension
        self.exlude = {
            "$SysReset", "AMD", "inetpub", "NVIDIA", "PerfLogs",
            "Windows.old", "Windows", "ProgrammData",
            "Programm Files (x86)", "Programm Files",
            "Doc", "Fotos", "Lib", "lib", "libs"
            "Scripts", "Tools", "bin", "Config", "Logs", "log",
            "mods", "win"
            }

        self.fullThreadSearch()

    def SearchHarddisk(self, hd):
        for root, dirs, files in os.walk(hd, topdown=True):
            dirs[:] = [d for d in dirs if d not in self.exlude]
            for f_name in files:
                file_path = os.path.join(root, f_name)
                if file_path.endswith(".mp3"):
                    self.allfiles.append(file_path)
                    print(file_path)

    def fullThreadSearch(self):
        with futures.ProcessPoolExecutor(max_workers=len(self.harddisks)) as thr:
            for harddisk in self.harddisks:
                thr.submit(self.SearchHarddisk, harddisk)

if __name__ == "__main__":
    starttime = time.time()
    ST = SearchThreader()
    print(ST.allfiles)
    print(time.time() - starttime)

2 个答案:

答案 0 :(得分:1)

正如@ Trap' answer中所述,您需要从SearchHarddisk()方法返回结果,而不是尝试将它们附加到self.allfiles中的fullThreadSearch() }。这是因为SearchHarddisk()的每次调用都在自己的地址空间中运行,因此每个地址空间中都有一个不同的self.allfiles列表对象。

这些更改的内容似乎可以在我的Windows计算机上运行。请注意,我根据使用ProcessPoolExecutor.map()方法的文档的ProcessPoolExecutor Example部分中显示的示例代码,而不是反复调用ProcessPoolExecutor.submit()

import concurrent.futures as futures
import os
import time
import win32api

class SearchThreader():
    def __init__(self):
        self.allfiles = []
        self.harddisks = win32api.GetLogicalDriveStrings().split('\000')[:-1]
        #skip the folders that shouldn't have files with this extension
        self.exlude = {
            "$SysReset", "AMD", "inetpub", "NVIDIA", "PerfLogs",
            "Windows.old", "Windows", "ProgrammData",
            "Programm Files (x86)", "Programm Files",
            "Doc", "Fotos", "Lib", "lib", "libs"
            "Scripts", "Tools", "bin", "Config", "Logs", "log",
            "mods", "win"
            }

        self.fullThreadSearch()

    def SearchHarddisk(self, hd):
        allfiles = []  # Local variable.
        for root, dirs, files in os.walk(hd, topdown=True):
            dirs[:] = [d for d in dirs if d not in self.exlude]
            for f_name in files:
                file_path = os.path.join(root, f_name)
                if file_path.endswith(".mp3"):
                    allfiles.append(file_path)  # Append to local list.
                    print(file_path)
        return allfiles  # Return all found on this harddisk.

    def fullThreadSearch(self):
        with futures.ProcessPoolExecutor() as executor:
            for harddisk, matching_files in zip(
                    self.harddisks, executor.map(self.SearchHarddisk, self.harddisks)):
                print('harddisk: {}, matching_files: {}'.format(harddisk, matching_files))
                self.allfiles.extend(matching_files)

if __name__ == "__main__":
    starttime = time.time()
    ST = SearchThreader()
    print(ST.allfiles)
    print(time.time() - starttime)

答案 1 :(得分:0)

我从未使用过ProcessPoolExecutor类,但我认为您的错误是因为self.allfiles不会在创建的进程中共享。 您的SearchHarddisk方法应该返回一个值,并且在完成该过程之后,您必须收集每个结果并将它们附加到self.allfiles。 这就是我要做的,但由于我没有运行Windows,我无法测试,所以我不确定它是否会起作用。

.