使用正则表达式过滤目录,并将过滤后的文件输出到另一个目录

时间:2019-06-07 16:10:16

标签: python regex python-3.x glob os.path

我只是想创建一个运行在特定目录中所有.sql文件中的python 3程序,然后应用添加的正则表达式;在某个实例之后,将对文件所做的更改写入各自文件名相同的单独目录中。

因此,如果我在“ / home / files”目录中有file1.sql和file2.sql,则在运行程序后,输出应将这两个文件写入“ / home / new_files”,而无需更改原始文件。

这是我的代码:

import glob
import re
folder_path = "/home/files/d_d"
file_pattern = "/*sql"
folder_contents = glob.glob(folder_path + file_pattern)


for file in folder_contents:
    print("Checking", file)
for file in folder_contents:
    read_file = open(file, 'rt',encoding='latin-1').read()
    #words=read_file.split()
    with open(read_file,"w") as output:
        output.write(re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', f, flags=re.DOTALL))

我收到的文件名错误时间太长:“ CREATE EXTERNAL TABLe”,而且我也不太确定将输出路径(/ home / files / new_dd)放在代码中的位置。

有什么想法或建议吗?

1 个答案:

答案 0 :(得分:0)

通过read_file = open(file, 'rt',encoding='latin-1').read(),文件的全部内容都被用作文件描述符。此处提供的代码遍历以glob.glob模式打开的文件名,该模式打开以读取,处理数据和打开以进行写入(假设文件夹newfile_sqls已经存在, 如果没有,则会出现错误FileNotFoundError: [Errno 2] No such file or directory)。

import glob
import os
import re

folder_path = "original_sqls"
#original_sqls\file1.sql, original_sqls\file2.sql, original_sqls\file3.sql
file_pattern = "*sql"
# new/modified files folder
output_path = "newfile_sqls"

folder_contents = glob.glob(os.path.join(folder_path,file_pattern))

# iterate over file names
for file_ in [os.path.basename(f) for f in folder_contents]:

    # open to read
    with open(os.path.join(folder_path,file_), "r") as inputf:
        read_file = inputf.read()

    # use variable 'read_file' here
    tmp = re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', read_file, flags=re.DOTALL)

    # open to write to (previouly created) new folder
    with open(os.path.join(output_path,file_), "w") as output:
        output.writelines(tmp)