我必须清理具有以下结构的csv文件:
Schema Compare Sync Script 06/10/2016 11:05:03 Page 1 1 -------------------------------------------------------------------------- 2 -- Play this script in ASIA@COG2 to make it look like ASIA@TSTCOG2 3 -- 4 -- Please review the script before using it to make sure it won't 5 -- cause any unacceptable data loss. --- --- 14 Set define off; 15 16 ALTER TABLE ASIA_MART.FDM_INVOICE 17 MODIFY(I_STATUS VARCHAR2(32 CHAR)); -- -- Schema Compare Sync Script 06/10/2016 11:05:03 Page 2 76 ACCRUED_GP FLOAT(126) NOT NULL, 77 ACCRUALS_CREATE_BY VARCHAR2(64 CHAR) NOT NULL, -- -- 150 MINEXTENTS 1 Schema Compare Sync Script 06/10/2016 11:05:03 Page 3 151 MAXEXTENTS UNLIMITED
所以我的目标是在另一个文件中只保留没有任何注释或行号的SQL代码。
到目前为止,我已经能够消除第一个评论部分,即标志的结尾 字符串&#34;设置定义关闭&#34;,并且还捕获其他类似&#34;架构比较同步脚本&#34;这会是个问题吗?对我来说,挑战是到目前为止赶上并消除。 实际上我的代码从第15行生成一个列表,但也是一个奇怪的日期重复。</ p>
首先我很确定它不是最好的代码,所以建议更受欢迎,如果有人知道如何乘坐数字线,我会更加欣赏。
这是我的代码:
import re
from itertools import dropwhile
flag = 'Set define off'
found = False
buff = []
with open("delta.txt", "r") as infile, open('delta_fil2.txt', 'w') as outfile:
searchlines = infile.readlines()
for i, line in enumerate(searchlines):
if flag in line :
found = True
if found:
#iterate over the list after the flag and attach to the list buff
for l in searchlines[i:i+1]:
buff.append(searchlines[i+2:len(searchlines)])
else:
searchlines.remove(line)
#generator to append a list of string to the list values = ','.join(str(v) for v in buff)
for i, line in enumerate(searchlines):
for line in dropwhile(lambda line: line.startswith(r'\d+'), searchlines):
buff.append(searchlines[i])
outfile.write(''.join(str(v) for v in buff))
答案 0 :(得分:0)
行开头的数字可用于过滤:
with open("delta.txt", "r") as infile, open('delta_fil2.txt', 'w') as outfile:
for line in infile:
sline = line.split(" ")
if len(sline) < 2 : continue
if sline[0].isdigit() and sline[1] != "--":
outfile.write(line[len(sline[0])+1:])