Question

我正在编写一个脚本，该脚本将从一个位置读取文件，操纵数据，然后将输出写入其他位置。在命令行，用户将使用-p指定顶级文件夹，然后脚本将在那里递归并找到所有文件。我现在正在使用glob这样做，并且文件的读取很好。

但我也希望用户指定一个输出文件夹来写入文件，我想维护输入路径的文件夹结构。

for eachFile in glob(args.path + "/*/*.json"): <- this seems dangerous. Better way?
  # do something to the json file

  # output the modified data to its new home
  #outfile = os.path.join(args.output, os.path.dirname(eachFile), eachFile) <- doesn't work
  outfile = os.path.join(args.putout, os.path.dirname(eachFile)[1:], eachFile)

最后一行是我做过的最好的一行，但它有一个问题，即当它剥离目录前面的"/"时，它正在posix机器上运行。另外，我们说我传入~/Documents/2014的输入路径和/tmp的输出。这些文件将写入/tmp/Users/myusername/Documents/2014/blah/whatever.json。

这似乎是一个相当常见的用例，所以我很惊讶我找不到其他需要这样做的人或一个简单易用的模块。有什么建议吗？

Answer 1

这是一个接近您需要的脚本。这里的关键是，而不是glob，你需要os.walk，因为你想深入了解目录结构。您需要添加健全性检查，但这是一个良好的开端。

# Recurse and process files.
import os
import sys
from fnmatch import fnmatch
import shutil


def process(src_dir, dst_dir, pattern='*'):
    """Iterate through src_dir, processing all files that match pattern and
    store them, including their parent directories in dst_dir.
    """
    assert src_dir != dst_dir, 'Source and destination dir must differ.'
    for dirpath, dirnames, filenames in os.walk(src_dir):
        # Filter out files that match pattern only.
        filenames = filter(lambda fname: fnmatch(fname, pattern), filenames)

        if filenames:
            dir_ = os.path.join(dst_dir, dirpath)
            os.makedirs(dir_)
            for fname in filenames:
                in_fname = os.path.join(dirpath, fname)
                out_fname = os.path.join(dir_, fname)

                # At this point, the destination directory is created and you
                # have a valid input / output filename, so you'd call your
                # function to process these files.  I just copy them :D
                shutil.copyfile(in_fname, out_fname)

if __name__ == '__main__':
    process(sys.argv[1], sys.argv[2], '*.txt')

写文件并维护文件夹结构

1 个答案: