我正在尝试编写一个类,用于搜索具有特定扩展名的计算机上的所有文件。为了使流程更快,我使用线程。它同时搜索所有硬盘。
我知道它在我打印print(file_path)
但是这些值不会附加在self.ALLFILES
中,我不知道为什么。
以下是代码:
from concurrent import futures
import time
import win32api
import os
class SearchThreader():
def __init__(self):
self.allfiles = []
self.harddisks = win32api.GetLogicalDriveStrings().split('\000')[:-1]
#skip the folders that shouldn't have files with this extension
self.exlude = {
"$SysReset", "AMD", "inetpub", "NVIDIA", "PerfLogs",
"Windows.old", "Windows", "ProgrammData",
"Programm Files (x86)", "Programm Files",
"Doc", "Fotos", "Lib", "lib", "libs"
"Scripts", "Tools", "bin", "Config", "Logs", "log",
"mods", "win"
}
self.fullThreadSearch()
def SearchHarddisk(self, hd):
for root, dirs, files in os.walk(hd, topdown=True):
dirs[:] = [d for d in dirs if d not in self.exlude]
for f_name in files:
file_path = os.path.join(root, f_name)
if file_path.endswith(".mp3"):
self.allfiles.append(file_path)
print(file_path)
def fullThreadSearch(self):
with futures.ProcessPoolExecutor(max_workers=len(self.harddisks)) as thr:
for harddisk in self.harddisks:
thr.submit(self.SearchHarddisk, harddisk)
if __name__ == "__main__":
starttime = time.time()
ST = SearchThreader()
print(ST.allfiles)
print(time.time() - starttime)
答案 0 :(得分:1)
正如@ Trap' answer中所述,您需要从SearchHarddisk()
方法返回结果,而不是尝试将它们附加到self.allfiles
中的fullThreadSearch()
}。这是因为SearchHarddisk()
的每次调用都在自己的地址空间中运行,因此每个地址空间中都有一个不同的self.allfiles
列表对象。
这些更改的内容似乎可以在我的Windows计算机上运行。请注意,我根据使用ProcessPoolExecutor.map()
方法的文档的ProcessPoolExecutor Example部分中显示的示例代码,而不是反复调用ProcessPoolExecutor.submit()
。
import concurrent.futures as futures
import os
import time
import win32api
class SearchThreader():
def __init__(self):
self.allfiles = []
self.harddisks = win32api.GetLogicalDriveStrings().split('\000')[:-1]
#skip the folders that shouldn't have files with this extension
self.exlude = {
"$SysReset", "AMD", "inetpub", "NVIDIA", "PerfLogs",
"Windows.old", "Windows", "ProgrammData",
"Programm Files (x86)", "Programm Files",
"Doc", "Fotos", "Lib", "lib", "libs"
"Scripts", "Tools", "bin", "Config", "Logs", "log",
"mods", "win"
}
self.fullThreadSearch()
def SearchHarddisk(self, hd):
allfiles = [] # Local variable.
for root, dirs, files in os.walk(hd, topdown=True):
dirs[:] = [d for d in dirs if d not in self.exlude]
for f_name in files:
file_path = os.path.join(root, f_name)
if file_path.endswith(".mp3"):
allfiles.append(file_path) # Append to local list.
print(file_path)
return allfiles # Return all found on this harddisk.
def fullThreadSearch(self):
with futures.ProcessPoolExecutor() as executor:
for harddisk, matching_files in zip(
self.harddisks, executor.map(self.SearchHarddisk, self.harddisks)):
print('harddisk: {}, matching_files: {}'.format(harddisk, matching_files))
self.allfiles.extend(matching_files)
if __name__ == "__main__":
starttime = time.time()
ST = SearchThreader()
print(ST.allfiles)
print(time.time() - starttime)
答案 1 :(得分:0)
我从未使用过ProcessPoolExecutor类,但我认为您的错误是因为self.allfiles不会在创建的进程中共享。 您的SearchHarddisk方法应该返回一个值,并且在完成该过程之后,您必须收集每个结果并将它们附加到self.allfiles。 这就是我要做的,但由于我没有运行Windows,我无法测试,所以我不确定它是否会起作用。
.