Python - 循环遍历目录中的子文件夹和文件,而不忽略子文件夹

时间:2017-12-07 19:36:48

标签: python subdirectory

我已经阅读了循环遍历子文件夹的所有堆栈交换帮助文件,以及os文档,但我仍然被卡住了。我试图循环子文件夹中的文件,打开每个文件,提取第一行中的第一个数字,将文件复制到不同的子文件夹(具有相同的名称,但在输出目录中)并重命名文件副本的数字为后缀。

import os
import re
outputpath = "C:/Users/Heather/Dropbox/T_Files/Raw_FRUS_Data/Wisconsin_Copies_With_PageNumbers"
inputpath = "C:/Users/Heather/Dropbox/T_Files/Raw_FRUS_Data/FRUS_Wisconsin"
suffix=".txt" 
for root, dirs, files  in os.walk(inputpath):
    for file in files:
        file_path = os.path.join(root, file)
        foldername=os.path.split(os.path.dirname(file_path))[1]
        filebname=os.path.splitext(file)[0]
        filename=filebname + "_"
        f=open(os.path.join(root,file),'r')
        data=f.readlines()
        if data is None:
            f.close() 
        else:
            with open(os.path.join(root,file),'r') as f:
                for line in f:
                    s=re.search(r'\d+',line)
                    if s:
                        pagenum=(s.group())
                        break
        with open(os.path.join(outputpath, foldername,filename+pagenum+suffix), 'w') as f1:
            with open(os.path.join(root,file),'r') as f:
                for line in f:
                    f1.write(line)

我希望结果是放在输出目录中相应子文件夹中的输入目录中的文件副本,用后缀重命名,例如“005_2”,其中005是原始文件名,2是编号从中提取的python代码。

我得到的错误似乎表明我没有正确循环文件。我知道提取第一个数字和重命名文件的代码是有效的,因为我在一个文件上测试它。但是使用os.walk循环遍历多个子文件夹是行不通的,我无法弄清楚我做错了什么。这是错误:

File "<ipython-input-1-614e2851f16a>", line 23, in <module>
    with open(os.path.join(outputpath, foldername,filename+pagenum+suffix), 'w') as f1:
IOError: [Errno 2] No such file or directory: 'C:/Users/Heather/Dropbox/T_Files/Raw_FRUS_Data/Wisconsin_Copies_With_PageNumbers\\FRUS_Wisconsin\\.dropbox_1473986809.txt'

1 个答案:

答案 0 :(得分:0)

嗯,这不是滔滔不绝,但它有效

from glob import glob
folderlist=glob("C:\\...FRUS_Wisconsin*\\")
outputpath = "C:\\..\Wisconsin_Copies_With_PageNumbers"
for folder in folderlist:
    foldername = str(folder.split('\\')[7])
    for root, dirs, files in os.walk(folder):
        for file in files:
            filebname=os.path.splitext(file)[0]
            filename=filebname + "_"
            if not filename.startswith('._'):
                with open(os.path.join(root,file),'r') as f:
                    for line in f:
                        s=re.search(r'\d+',line)
                        if s:
                           pagenum=(s.group())
                           break
                     with open(os.path.join(outputpath, foldername,filename+pagenum+suffix), 'w') as f1:
                        with open(os.path.join(root,file),'r') as f:
                            for line in f:
                                f1.write(line)