Question

我只是想创建一个运行在特定目录中所有.sql文件中的python 3程序，然后应用添加的正则表达式；在某个实例之后，将对文件所做的更改写入各自文件名相同的单独目录中。

因此，如果我在“ / home / files”目录中有file1.sql和file2.sql，则在运行程序后，输出应将这两个文件写入“ / home / new_files”，而无需更改原始文件。

这是我的代码：

import glob
import re
folder_path = "/home/files/d_d"
file_pattern = "/*sql"
folder_contents = glob.glob(folder_path + file_pattern)


for file in folder_contents:
    print("Checking", file)
for file in folder_contents:
    read_file = open(file, 'rt',encoding='latin-1').read()
    #words=read_file.split()
    with open(read_file,"w") as output:
        output.write(re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', f, flags=re.DOTALL))

我收到的文件名错误时间太长：“ CREATE EXTERNAL TABLe”，而且我也不太确定将输出路径（/ home / files / new_dd）放在代码中的位置。

有什么想法或建议吗？

Answer 1

通过read_file = open(file, 'rt',encoding='latin-1').read()，文件的全部内容都被用作文件描述符。此处提供的代码遍历以glob.glob模式打开的文件名，该模式打开以读取，处理数据和打开以进行写入（假设文件夹newfile_sqls已经存在，如果没有，则会出现错误FileNotFoundError: [Errno 2] No such file or directory）。

import glob
import os
import re

folder_path = "original_sqls"
#original_sqls\file1.sql, original_sqls\file2.sql, original_sqls\file3.sql
file_pattern = "*sql"
# new/modified files folder
output_path = "newfile_sqls"

folder_contents = glob.glob(os.path.join(folder_path,file_pattern))

# iterate over file names
for file_ in [os.path.basename(f) for f in folder_contents]:

    # open to read
    with open(os.path.join(folder_path,file_), "r") as inputf:
        read_file = inputf.read()

    # use variable 'read_file' here
    tmp = re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', read_file, flags=re.DOTALL)

    # open to write to (previouly created) new folder
    with open(os.path.join(output_path,file_), "w") as output:
        output.writelines(tmp)

使用正则表达式过滤目录，并将过滤后的文件输出到另一个目录

1 个答案: