我正在使用以下函数从目标目录中获取系统中的所有文件大小。
def get_files(target):
# Get file size and modified time for all files from the target directory and down.
# Initialize files list
filelist = []
# Walk the directory structure
for root, dirs, files in os.walk(target):
# Do not walk into directories that are mount points
dirs[:] = filter(lambda dir: not os.path.ismount(os.path.join(root, dir)), dirs)
for name in files:
# Construct absolute path for files
filename = os.path.join(root, name)
# Test the path to account for broken symlinks
if os.path.exists(filename):
# File size information in bytes
size = float(os.path.getsize(filename))
# Get the modified time of the file
mtime = os.path.getmtime(filename)
# Create a tuple of filename, size, and modified time
construct = filename, size, str(datetime.datetime.fromtimestamp(mtime))
# Add the tuple to the master filelist
filelist.append(construct)
return(filelist)
如何修改此项以包含包含目录的第二个列表以及目录的总大小?我试图将此操作包含在一个函数中,希望比在单独的函数中执行第二次遍历以获取目录信息和大小更有效。
我们的想法是能够使用前20个最大文件的排序列表和最前10个最大目录的第二个排序列表进行报告。
感谢您有任何建议。
答案 0 :(得分:1)
我在字典而不是列表中输出目录,但看看你是否喜欢它:
def get_files(target):
# Get file size and modified time for all files from the target directory and down.
# Initialize files list
filelist = []
dirdict = {}
# Walk the directory structure
for root, dirs, files in os.walk(target):
# Do not walk into directories that are mount points
dirs[:] = filter(lambda dir: not os.path.ismount(os.path.join(root, dir)), dirs)
for name in files:
# Construct absolute path for files
filename = os.path.join(root, name)
# Test the path to account for broken symlinks
if os.path.exists(filename):
# File size information in bytes
size = float(os.path.getsize(filename))
# Get the modified time of the file
mtime = os.path.getmtime(filename)
# Create a tuple of filename, size, and modified time
construct = filename, size, str(datetime.datetime.fromtimestamp(mtime))
# Add the tuple to the master filelist
filelist.append(construct)
if root in dirdict.keys():
dirdict[root] += size
else:
dirdict[root] = size
return(filelist, dirdict)
如果您希望将dirdict作为元组列表,请执行以下操作:
dirdict.items()
答案 1 :(得分:0)
我有更多的脚本围绕这种类型的东西,我刚刚将'bigfiles.py'上传到github http://github.com/sente/sys-utils/blob/master/bigfiles.py
它不会计算总累积目录大小,但可以毫不费力地进行修改。
我有其他代码可以在给定深度总计目录大小,例如:
In [7]: t = build_tree_from_directory('/scratch/stu/')
In [8]: pprint.pprint(walk_tree(t,depth=0))
{'name': 'ROOT', 'size': 6539880514}
In [9]: pprint.pprint(walk_tree(t,depth=0))
{'name': 'ROOT', 'size': 6539880514}
In [10]: pprint.pprint(walk_tree(t,depth=1))
{'children': [{'name': 'apache2-gzip', 'size': 112112512},
{'name': 'gitnotes', 'size': 897104422},
{'name': 'finder', 'size': 3810736368},
{'name': 'apache2', 'size': 1719919406}],
'name': 'ROOT'}
In [12]: pprint.pprint(walk_tree(t,depth=2))
{'children': [{'children': [{'name': 'vhost', 'size': 103489662}],
'name': 'apache2-gzip'},
{'children': [{'name': '2', 'size': 533145458},
{'name': 'notes.git', 'size': 363958964}],
'name': 'gitnotes'},
{'children': [{'name': 'gzipped', 'size': 3810736368},
{'name': 'output.txt', 'size': 0}],
'name': 'finder'},
{'children': [{'name': 'sente_combined.log', 'size': 0},
{'name': 'lisp_ssl.log', 'size': 0},
{'name': 'vhost', 'size': 1378778576},
{'name': 'other_vhosts_access.log', 'size': 0},
{'name': 'ssl_error.log', 'size': 0},
{'name': 'ssl_access.log', 'size': 0},
{'name': 'sente_test.log', 'size': 0}],
'name': 'apache2'}],
'name': 'ROOT'}
FS只被抓取一次,但是为了获得完整的大小,树需要在创建后才能行走,如果从叶节点开始并按照你的方式走向根,你可以最有效地计算每个目录的总大小