为什么递归搜索文件夹中的文件无法正常工作

时间:2017-05-26 17:32:44

标签: python recursion

我需要帮助才能理解为什么这段代码没有按预期工作。

我的目录结构如下所示:

|- tryWalkDir.py

   TryCPP/
   TryCPP/tryHashMap/
   TryCPP/tryHashMap/tryHashMap.cpp
   TryCPP/tryHashMap/tryHashMap.o*

脚本 - tryWalkDir.py旨在搜索所有.cpp文件。我不知道为什么

[TryCPP / tryHashMap / tryHashMap.cpp,TryCPP / tryHashMap / tryHashMap.cpp]收集2次?

Enter into depth:0, folder:TryCPP
folder:TryCPP, cur:TryCPP, sub:['tryHashMap'], files:[], depth:0
recresively call - s:tryHashMap
Enter into depth:1, folder:TryCPP/tryHashMap
folder:TryCPP/tryHashMap, cur:TryCPP/tryHashMap, sub:[], files:
['tryHashMap.cpp', 'tryHashMap.o'], depth:1
process tryHashMap.cpp
append tryHashMap.cpp
process tryHashMap.o
Exit on depth:1, folder:TryCPP/tryHashMap
folder:TryCPP, cur:TryCPP/tryHashMap, sub:[], files:['tryHashMap.cpp', 'tryHashMap.o'], depth:0
process tryHashMap.cpp
append tryHashMap.cpp
process tryHashMap.o
Exit on depth:0, folder:TryCPP
['TryCPP/tryHashMap/tryHashMap.cpp', 'TryCPP/tryHashMap/tryHashMap.cpp']

tryWalkDir.py

class Cell(object):
    def __init__(self, fn, ext):    
       self.fn = fn
       self.ext = ext
       self.fl = [] #list all the files

    def collect_files(self, folder, depth=0):
    ''' collect all the folders containing corresponding extension scripts '''
        print 'Enter into depth:%d, folder:%s' % (depth,folder)

        # level one folder name should start with 'Try' or 'try'
        if depth == 1:
            filename = os.path.basename(folder)[:3]
            if filename in ['Try','try']:
                pass
            else:
                print 'L1 Dir - {0} must start with [Try,try], depth:{1}'.format(filename,depth)
            return

        for cur, sub, files in os.walk(folder):
            print 'folder:{}, cur:{}, sub:{}, files:{}, depth:{}'.format(folder,cur,sub,files,depth)

            #filter out all the files
            #[ self.fl.append(cur+'/'+f) for f in files if os.path.splitext(f)[1][1:] == self.ext ]
            for f in files:
                print 'process %s' % f
                if os.path.splitext(f)[1][1:] == self.ext:
                    print 'append %s' % f
                    self.fl.append(cur+'/'+f)

            #if sub:
            for s in sub:
                print 'recresively call - s:{}'.format(s)
                self.collect_files(cur+'/'+s,depth+1)

        print 'Exit on depth:%d, folder:%s' % (depth,folder)


    def start(self):
        self.collect_files(self.fn,0)
        #print self.fl


def main():
    cell = Cell('TryCPP','cpp')
    cell.start()
    print cell.fl

if __name__ == '__main__': main()

1 个答案:

答案 0 :(得分:0)

错误正在发生,因为您多次调用os.walk而未意识到这一点。 os.walk将您递归到子目录中。但是,然后为当前目录中的每个子目录调用self.collect_files(cur+'/'+s,depth+1)。这实际上会导致深度为N的文件在输出数组中出现N次。

要修复代码,只需删除循环

即可
for s in sub:
    print 'recresively call - s:{}'.format(s)
    self.collect_files(cur+'/'+s,depth+1)

顺便说一下,您应该使用os.path.join而不是在整个代码中手动连接斜杠。例如,self.fl.append(cur+'/'+f)可以阅读self.fl.append(join(cur, f))。这是os.walk文档建议的方式:

  

要获取dirpath中文件或目录的完整路径(以top开头),请执行os.path.join(dirpath, name)