Python文件遍历

时间:2013-06-19 06:19:11

标签: python filesystems file-traversal

我正在尝试创建一个函数,该函数接受一个根文件的名称,然后遍历该目录并返回一个这样的列表。

[["folder1",[
    ["subfolder1",[
        "file1",
        "file2"
    ]],
    ["subfolder2",[
        "file3",
        "file4"
    ]]
],"file5","file6"]

以下是我对该功能的尝试:

def traverse(rootdir):
    names = []
    for cdirname, dirnames, filenames in os.walk(rootdir):
        # record path to all subdirectories first.
        for subdirname in dirnames:
            names.append([subdirname,traverse(os.path.join(cdirname, subdirname))])

        # record path to all filenames.
        for filename in filenames:
            names.append(os.path.join(cdirname, filename))

    return names

我的问题是我总是最终得到与该函数一起记录的相同文件/文件夹的重复,并且我总是相对于“rootdir”显示路径,而不仅仅是相应文件/文件夹的名称。我怎么清除重复的?另外,我怎么能这样做,以便它不是记录的完整路径。

2 个答案:

答案 0 :(得分:1)

sorted用于将目录放在第一位。如果您不介意该订单,只需返回names

def traverse(rootdir):
    names = []
    dirs, files = [], []
    for filename in os.listdir(rootdir):
        filepath = os.path.join(rootdir, filename)
        if os.path.isdir(filepath):
            names.append([filename, traverse(filepath)])
        else:
            names.append(filename)
    return sorted(names, key=lambda x: (0, x[0]) if isinstance(x, list) else (1, x))

使用os.walk的另一个版本:

def traverse(rootdir):
    names = []
    dir_to_names = {rootdir: names}
    for cdirname, dirnames, filenames in os.walk(rootdir):
        subnames = dir_to_names[cdirname]
        for subdirname in sorted(dirnames):
            subnames2 = dir_to_names[os.path.join(cdirname, subdirname)] = []
            subnames.append([subdirname, subnames2])
        for filename in sorted(filenames):
            subnames.append(filename)
    return names

答案 1 :(得分:0)

您可以使用os.walk()获取所有子目录和子文件。它返回一个包含“三元组”的列表('current path',[subdirs],[subfiles])。但这对我的需求不起作用,所以我编写了以下脚本。希望这会有所帮助。

它的作用是,它为包含文件和目录的每个文件夹创建一个对象,并按字母顺序对它们进行排序。我查看了os.walk及其工作原理,这是一种类似的方法(使用isdir())。 tab变量只是为了更好地查看输出。

import os


class Folder():
    """ Generate a tree list from a given directory """
    # List of prohibited_dirs folders on any levels
    prohibited_dirs = set([])
    prohibited_files = set([])
    tab = 0
    def __init__(self, path, folder_name):
        """ path should be /home/example, folder_name: example """
        self.path = path
        self.folder_name = folder_name
        self.sub_dirs = []
        self.sub_files = []
        self.__class__.tab += 1
        # print self.tab

    def sorter(self):
        """ sorts listdir output for folders and files"""
        # Sort Folders and Files
        names = os.listdir(self.path)
        for name in names:
            if os.path.isdir(os.path.join(self.path, name)):
                self.sub_dirs.append(name)
            else:
                self.sub_files.append(name)

    def list_stuff(self):
        """ sort lists, and iterate overall subfolders/files."""
        # Sort alphabetically
        self.sub_dirs.sort(key=str.lower)
        self.sub_files.sort(key=str.lower)
        # all subfolders, if is also break condition
        if self.sub_dirs:
            # Filter prohibited_dirs Folders
            for sub_dir in self.sub_dirs:
                if sub_dir in self.__class__.prohibited_dirs:
                    continue
                print "\t" * self.tab + sub_dir
                # Go deeper
                deeper = Folder(os.path.join(self.path, sub_dir), sub_dir)
                deeper.sorter()
                deeper.list_stuff()
                # Free object
                del deeper
                self.__class__.tab -= 1
        # list all Files, if is also break condition
        if self.sub_files:
            for sub_file in self.sub_files:
                if sub_file in self.__class__.prohibited_files:
                    continue
                print "\t" * self.tab + sub_file

STARTDIRECTORY = "."
STARTFOLDER = "."

runner = Folder(STARTDIRECTORY, STARTFOLDER)
runner.sorter()
runner.list_stuff()