我正在编写一个脚本来处理起始文件夹的所有目录和子目录,但是我遇到了内存错误(错误是MemoryError
)。我的猜测可能是我的data_dicts
列表太大但我不确定。任何建议将不胜感激。
import os
# example data dictionary
data_dict = {
'filename': 'data.csv',
'folder': 'R:/',
'size': 300000
}
def get_file_sizes_folder(data_dicts, starting_folder):
# Given a list of file information dictionaries and a folder, iterate over the files
# in the folder to get their information and append it to the list.
# Also recurse through subdirectories
for entry in os.scandir(starting_folder):
if not entry.name.startswith('.'):
if entry.is_file():
size = entry.stat().st_size
filename = entry.name
folder = os.path.dirname(entry.path)
temp_dict = {'filename': filename, 'size': size, 'folder': folder}
data_dicts.append(temp_dict.copy())
else:
print(entry.path)
data_dicts.extend(get_file_sizes_folder(data_dicts, entry.path))
return data_dicts
d = get_file_sizes_folder([], 'R:/')
答案 0 :(得分:3)
您不应提供get_file_sizes_folder()
作为您的函数starting_folder
的参数。这样做会产生很多很多重复的条目,其速度可能几乎是因子的。难怪你的电脑很快耗尽内存!
相反,只使用data_dicts
作为参数,只需在函数的第一行创建一个新列表def get_file_sizes_folder(starting_folder):
# Given a list of file information dictionaries and a folder, iterate over the files
# in the folder to get their information and append it to the list.
# Also recurse through subdirectories
data_dicts = []
for entry in os.scandir(starting_folder):
if not entry.name.startswith('.'):
if entry.is_file():
size = entry.stat().st_size
filename = entry.name
folder = os.path.dirname(entry.path)
temp_dict = {'filename': filename, 'size': size, 'folder': folder}
data_dicts.append(temp_dict)
else:
print(entry.path)
data_dicts.extend(get_file_sizes_folder(entry.path))
return data_dicts
,如下所示:
JavaScriptObject
答案 1 :(得分:1)
你根本不应该进行递归。使用os.walk
示例:
def get_file_sizes_folder(starting_folder):
data_dicts = list()
for root, _, files in os.walk(starting_folder):
data_dicts.extend({
'filename': f,
'size': os.path.getsize(os.path.join(root, f)),
'folder': root,
} for f in files)
return data_dicts
d = get_file_sizes_folder('R:/')