Python,os.walk(),传递信息备份?

时间:2012-12-07 16:54:11

标签: python recursion os.walk

我目前正在尝试编写一个简单的python程序,它循环遍历一堆子目录,查找java文件并打印一些关于某些关键字的使用次数的信息。我已经设法让这个工作在大多数情况下。我遇到的问题是打印有关更高目录的整体信息,例如,我当前的输出如下:

testcases/part1/testcase2/root_dir:
    0   bytes     0   public     0   private     0   try     0   catch
testcases/part1/testcase2/root_dir/folder1:
    12586   bytes     19   public     7   private     8   try     22   catch
testcases/part1/testcase2/root_dir/folder1/folder5:
    7609   bytes     9   public     2   private     7   try     11   catch
testcases/part1/testcase2/root_dir/folder4:
    0   bytes     0   public     0   private     0   try     0   catch
testcases/part1/testcase2/root_dir/folder4/folder2:
    7211   bytes     9   public     2   private     4   try     9   catch
testcases/part1/testcase2/root_dir/folder4/folder3:
    0   bytes     0   public     0   private     0   try     0   catch

我希望输出为:

testcases/part1/testcase2/root_dir :
    27406  bytes    37  public    11  private    19  try    42  catch
testcases/part1/testcase2/root_dir/folder1 :
    20195  bytes    28  public     9  private    15  try     33  catch
testcases/part1/testcase2/root_dir/folder1/folder5 :
    7609  bytes     9  public     2  private     7  try      11  catch
testcases/part1/testcase2/root_dir/folder4 :
    7211  bytes     9  public     2  private     4  try     9  catch
testcases/part1/testcase2/root_dir/folder4/folder2 :
    7211  bytes     9  public     2  private     4  try     9  catch
testcases/part1/testcase2/root_dir/folder4/folder3 :
    0  bytes        0  public     0  private     0  try     0  catch

如您所见,较低的子目录直接向较高的子目录提供信息。这是我遇到的问题。如何有效地实现这一点。我已经考虑将每个打印作为字符串存储在列表中,然后在最后打印所有内容,但我认为这不适用于多个子目录,例如提供的示例。到目前为止,这是我的代码:

def lsJava(path):

print()

for dirname, dirnames, filenames in os.walk(path):

    size = 0
    public = 0
    private = 0
    tryCount = 0
    catch = 0

    #Get stats by current directory.
    tempStats = os.stat(dirname)

    #Print current directory information

    print(dirname + ":")

    #Print files of directory.
    for filename in filenames:
        if(filename.endswith(".java")):
            fileTempStats = os.stat(dirname + "/" + filename)
            size += fileTempStats[6]
            tempFile = open(dirname + "/" + filename)
            tempString = tempFile.read()
            tempString = removeComments(tempString)
            public += tempString.count("public", 0, len(tempString))
            private += tempString.count("private", 0, len(tempString))
            tryCount += tempString.count("try", 0, len(tempString))
            catch += tempString.count("catch", 0, len(tempString))

    print("       ", size, "  bytes    ", public, "  public    ",
        private, "  private    ", tryCount, "  try    ", catch,
        "  catch")

removeComments函数只是使用正则表达式模式从java文件中删除所有注释。感谢您提前提供任何帮助。

编辑:

在for循环的开头添加了以下代码:

    current_dirpath = dirname

    if( dirname != current_dirpath):
        size = 0
        public = 0
        private = 0
        tryCount = 0
        catch = 0

输出现在如下:

testcases/part1/testcase2/root_dir/folder1/folder5:
    7609   bytes     9   public     2   private     7   try     11   catch
testcases/part1/testcase2/root_dir/folder1:
    20195   bytes     28   public     9   private     15   try     33   catch
testcases/part1/testcase2/root_dir/folder4/folder2:
    27406   bytes     37   public     11   private     19   try     42   catch
testcases/part1/testcase2/root_dir/folder4/folder3:
    27406   bytes     37   public     11   private     19   try     42   catch
testcases/part1/testcase2/root_dir/folder4:
    27406   bytes     37   public     11   private     19   try     42   catch
testcases/part1/testcase2/root_dir:
    27406   bytes     37   public     11   private     19   try     42   catch

2 个答案:

答案 0 :(得分:2)

os.walk()采用可选的topdown参数。如果您使用os.walk(path, topdown=False),它将自动从底部向上遍历目录。

当您第一次启动循环时,将元组(dirpath)的第一个元素保存为current_dirpath之类的变量。当您继续循环时,您可以在该目录中保持文件大小的总计。然后只需添加if dirpath != current_dirpath之类的支票,此时您就知道自己已经上了目录级别,并且可以重置总数。

答案 1 :(得分:1)

我不相信你可以用一个计数器,甚至自下而上:如果目录A有子目录B和C,当你完成B时你需要在你下降到C之前将计数器归零;但是到了做A的时候,你需要添加B和C的大小(但B的计数早已不复存在)。

不是维护单个计数器,而是构建一个字典,将每个目录(键)映射到关联的计数(在元组或其他内容中)。当您迭代(自下而上)时,无论何时准备打印目录的输出,您都可以查找其所有子目录(来自dirname返回的os.walk()参数)并将它们的计数一起添加。

由于您不丢弃数据,因此可以扩展此方法以保持单独的深度和浅度计数,以便在扫描结束时您可以按浅计数对目录进行排序,报告最大的10个计数等。