我只是想创建一个运行在特定目录中所有.sql文件中的python 3程序,然后应用添加的正则表达式;在某个实例之后,将对文件所做的更改写入各自文件名相同的单独目录中。
因此,如果我在“ / home / files”目录中有file1.sql和file2.sql,则在运行程序后,输出应将这两个文件写入“ / home / new_files”,而无需更改原始文件。
这是我的代码:
import glob
import re
folder_path = "/home/files/d_d"
file_pattern = "/*sql"
folder_contents = glob.glob(folder_path + file_pattern)
for file in folder_contents:
print("Checking", file)
for file in folder_contents:
read_file = open(file, 'rt',encoding='latin-1').read()
#words=read_file.split()
with open(read_file,"w") as output:
output.write(re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', f, flags=re.DOTALL))
我收到的文件名错误时间太长:“ CREATE EXTERNAL TABLe”,而且我也不太确定将输出路径(/ home / files / new_dd)放在代码中的位置。
有什么想法或建议吗?
答案 0 :(得分:0)
通过read_file = open(file, 'rt',encoding='latin-1').read()
,文件的全部内容都被用作文件描述符。此处提供的代码遍历以glob.glob模式打开的文件名,该模式打开以读取,处理数据和打开以进行写入(假设文件夹newfile_sqls
已经存在,
如果没有,则会出现错误FileNotFoundError: [Errno 2] No such file or directory
)。
import glob
import os
import re
folder_path = "original_sqls"
#original_sqls\file1.sql, original_sqls\file2.sql, original_sqls\file3.sql
file_pattern = "*sql"
# new/modified files folder
output_path = "newfile_sqls"
folder_contents = glob.glob(os.path.join(folder_path,file_pattern))
# iterate over file names
for file_ in [os.path.basename(f) for f in folder_contents]:
# open to read
with open(os.path.join(folder_path,file_), "r") as inputf:
read_file = inputf.read()
# use variable 'read_file' here
tmp = re.sub(r'(TBLPROPERTIES \(.*?\))', r'\1;', read_file, flags=re.DOTALL)
# open to write to (previouly created) new folder
with open(os.path.join(output_path,file_), "w") as output:
output.writelines(tmp)