我目前正在尝试编写一个简单的python程序,它循环遍历一堆子目录,查找java文件并打印一些关于某些关键字的使用次数的信息。我已经设法让这个工作在大多数情况下。我遇到的问题是打印有关更高目录的整体信息,例如,我当前的输出如下:
testcases/part1/testcase2/root_dir:
0 bytes 0 public 0 private 0 try 0 catch
testcases/part1/testcase2/root_dir/folder1:
12586 bytes 19 public 7 private 8 try 22 catch
testcases/part1/testcase2/root_dir/folder1/folder5:
7609 bytes 9 public 2 private 7 try 11 catch
testcases/part1/testcase2/root_dir/folder4:
0 bytes 0 public 0 private 0 try 0 catch
testcases/part1/testcase2/root_dir/folder4/folder2:
7211 bytes 9 public 2 private 4 try 9 catch
testcases/part1/testcase2/root_dir/folder4/folder3:
0 bytes 0 public 0 private 0 try 0 catch
我希望输出为:
testcases/part1/testcase2/root_dir :
27406 bytes 37 public 11 private 19 try 42 catch
testcases/part1/testcase2/root_dir/folder1 :
20195 bytes 28 public 9 private 15 try 33 catch
testcases/part1/testcase2/root_dir/folder1/folder5 :
7609 bytes 9 public 2 private 7 try 11 catch
testcases/part1/testcase2/root_dir/folder4 :
7211 bytes 9 public 2 private 4 try 9 catch
testcases/part1/testcase2/root_dir/folder4/folder2 :
7211 bytes 9 public 2 private 4 try 9 catch
testcases/part1/testcase2/root_dir/folder4/folder3 :
0 bytes 0 public 0 private 0 try 0 catch
如您所见,较低的子目录直接向较高的子目录提供信息。这是我遇到的问题。如何有效地实现这一点。我已经考虑将每个打印作为字符串存储在列表中,然后在最后打印所有内容,但我认为这不适用于多个子目录,例如提供的示例。到目前为止,这是我的代码:
def lsJava(path):
print()
for dirname, dirnames, filenames in os.walk(path):
size = 0
public = 0
private = 0
tryCount = 0
catch = 0
#Get stats by current directory.
tempStats = os.stat(dirname)
#Print current directory information
print(dirname + ":")
#Print files of directory.
for filename in filenames:
if(filename.endswith(".java")):
fileTempStats = os.stat(dirname + "/" + filename)
size += fileTempStats[6]
tempFile = open(dirname + "/" + filename)
tempString = tempFile.read()
tempString = removeComments(tempString)
public += tempString.count("public", 0, len(tempString))
private += tempString.count("private", 0, len(tempString))
tryCount += tempString.count("try", 0, len(tempString))
catch += tempString.count("catch", 0, len(tempString))
print(" ", size, " bytes ", public, " public ",
private, " private ", tryCount, " try ", catch,
" catch")
removeComments函数只是使用正则表达式模式从java文件中删除所有注释。感谢您提前提供任何帮助。
编辑:
在for循环的开头添加了以下代码:
current_dirpath = dirname
if( dirname != current_dirpath):
size = 0
public = 0
private = 0
tryCount = 0
catch = 0
输出现在如下:
testcases/part1/testcase2/root_dir/folder1/folder5:
7609 bytes 9 public 2 private 7 try 11 catch
testcases/part1/testcase2/root_dir/folder1:
20195 bytes 28 public 9 private 15 try 33 catch
testcases/part1/testcase2/root_dir/folder4/folder2:
27406 bytes 37 public 11 private 19 try 42 catch
testcases/part1/testcase2/root_dir/folder4/folder3:
27406 bytes 37 public 11 private 19 try 42 catch
testcases/part1/testcase2/root_dir/folder4:
27406 bytes 37 public 11 private 19 try 42 catch
testcases/part1/testcase2/root_dir:
27406 bytes 37 public 11 private 19 try 42 catch
答案 0 :(得分:2)
os.walk()
采用可选的topdown
参数。如果您使用os.walk(path, topdown=False)
,它将自动从底部向上遍历目录。
当您第一次启动循环时,将元组(dirpath)的第一个元素保存为current_dirpath
之类的变量。当您继续循环时,您可以在该目录中保持文件大小的总计。然后只需添加if dirpath != current_dirpath
之类的支票,此时您就知道自己已经上了目录级别,并且可以重置总数。
答案 1 :(得分:1)
我不相信你可以用一个计数器,甚至自下而上:如果目录A有子目录B和C,当你完成B时你需要在你下降到C之前将计数器归零;但是到了做A的时候,你需要添加B和C的大小(但B的计数早已不复存在)。
不是维护单个计数器,而是构建一个字典,将每个目录(键)映射到关联的计数(在元组或其他内容中)。当您迭代(自下而上)时,无论何时准备打印目录的输出,您都可以查找其所有子目录(来自dirname
返回的os.walk()
参数)并将它们的计数一起添加。
由于您不丢弃数据,因此可以扩展此方法以保持单独的深度和浅度计数,以便在扫描结束时您可以按浅计数对目录进行排序,报告最大的10个计数等。