Question

我在获取列出给定目录中所有目录/子目录的xml结构时遇到了困难。我使用given post中的递归工作，我的问题比平时稍微强一些。我有可能有10000个文件的目录，因此检查每个内容以查看它是否是一个目录是否成本高昂，并且已经花了很长时间来构建xml。我想只为目录构建xml。

我知道linux有一些命令，比如 find。 -type d 列出存在的目录（不是文件）。我怎么能在python中实现这一点。

提前致谢。

Answer 1

os.walk已经区分了文件和目录：

def find_all_dirs(root='.'):
    for path,dirs,files in os.walk(root):
        for d in dirs:
            yield os.path.join(path, d)

Answer 2

只有一个目录...

import os

def get_dirs(p):
  p = os.path.abspath(p)
  return [n for n in os.listdir(p) if os.path.isdir(os.path.join(p, n))]

print "\n".join(get_dirs("."))

Answer 3

这是我在搜索和尝试不同的事情后得到的解决方案。我不是说这比查找目录中的每个内容的方法更快，但它实际上产生的结果更快（当目录包含1000个文件时差异可见）

import os
import subprocess
from xml.sax.saxutils import quoteattr as xml_quoteattr

def DirAsLessXML(path):

    result = '<dir type ={0} name={1} path={2}>\n'.format(xml_quoteattr('dir'),xml_quoteattr(os.path.basename(path)),xml_quoteattr(path))

    list = subprocess.Popen(['find', path,'-maxdepth', '1', '-type', 'd'],stdout=subprocess.PIPE, shell=False).communicate()[0]

    output_list = list.splitlines()
    if len(output_list) == 1:
        result = '<dir type ={0} name={1} path={2}>\n'.format(xml_quoteattr('leaf_dir'),xml_quoteattr(os.path.basename(path)),xml_quoteattr(path))

    for item in output_list[1:]:
        result += '\n'.join('  ' + line for line in DirAsLessXML(item).split('\n'))
    result += '</dir>\n'
    return result

获取给定目录中的子目录列表

3 个答案: