将子目录中的多个gz文件转换为csv

时间:2014-01-02 08:29:33

标签: python csv dir

我的主目录中有很多子目录,并且想编写一个脚本来解压缩并转换其中的所有文件。如果可能,我还想将单个目录中的所有CSV组合成一个CSV。但更重要的是,我需要帮助我的嵌套循环。

import gzip
import csv
import os

subdirlist = os.listdir('/home/user/Desktop/testloop')
subtotal = len(subdirlist)
subcounter = 0
for dirlist in subdirlist:
    print "Working On " + dirlist
    total = len(dirlist)
    counter = 0
    for dir in dirlist:
        print "Working On " + dir
        f = gzip.open('/' + str(subdirlist) + '/' + dir, 'rb')
        file_content = f.read()
        f.close()       
        print "25% Complete"    
        filename = '/' + str(subdirlist) + '/temp.txt'
        target = open(filename, 'w')
        target.write(file_content)
        target.close()
        print "50% Complete!"
        csv_file = '/' + str(subdirlist) + '/' + str(dir) + '.csv'
        in_txt = csv.reader(open(filename, "rb"), delimiter = '\t')
        out_csv = csv.writer(open(csv_file, 'wb'))
        out_csv.writerows(in_txt)
        os.remove(filename)
        os.remove('/' + str(subdirlist) + '/' + dir)
        counter+=1
        print str(counter) + "/" + str(total) + " " + str(dir) + " Complete!"
    print "SubDirectory Converted!"
    print str(subcounter) + "/" + str(subtotal) + " " + str(subdirlist) + " Complete!"
    subcounter+=1
print "All Files Converted!"

提前致谢

1 个答案:

答案 0 :(得分:1)

要获取文件和子目录的列表,可以使用os.walk。下面是我写的一个实现,用于获取任意嵌套子目录中的所有文件(可选地,某些类型):

from os import walk, sep
from functools import reduce # in Python 3.x only

def get_filelist(root, extensions=None):
    """Return a list of files (path and name) within a supplied root directory.

    To filter by extension(s), provide a list of strings, e.g.

        get_filelist(root, ["zip", "csv"])

    """
    return reduce(lambda x, y: x+y,
                  [[sep.join([item[0], name]) for name in item[2]
                    if (extensions is None or
                        name.split(".")[-1] in extensions)]
                   for item in walk(root)])