我整理了一个Python脚本,从数千个SQL文件中剥离了RCS关键字。基本上,它使用pyparse transformString来转换和剥离已知的RCS标签。但是,此功能有效,因为我无法知道transformString是否执行了ParseAction,即使扫描文件中没有RCS关键字,我的脚本也只是盲目地重写sql代码文件。
这是我剥离RCS关键字的示例代码,在决定写入当前文件之前,我需要知道该操作是否找到了要替换的令牌并实际进行了替换。如果transformString没有完成任何替换,我想跳过写文件。
from pyparsing import *
# simulate some SQL code
original_code = """
CREATE OR REPLACE FUNCTION oracle_function_name
(
p_company_code IN varchar2
)
--
RETURN number
IS
-- $Workfile: oracle_function_name.sql $
-- $Author: az $
-- $Date: 2018/11/20 $
-- $Revision: #1 $
l_rate := 0;
end if;
Close cur_rate;
--
return l_rate;
end;
/
"""
# Grammar definitions
Workfile_Grammar = ZeroOrMore('/*') + ZeroOrMore('*') + ZeroOrMore('--')+ CaselessKeyword('$Workfile:') + Word( alphas+"_"+alphas+".", alphanums+"_"+alphas+".") + CaselessKeyword('$') + LineStart()
Workfile_Grammar.setParseAction( replaceWith("") )
author_Grammar = ZeroOrMore('/*') + ZeroOrMore('*') + ZeroOrMore('--')+ CaselessKeyword('$Author:') + Word( alphas+"_"+alphas+".", alphanums+"_"+alphas+".") + CaselessKeyword('$') + LineStart()
author_Grammar.setParseAction(replaceWith(""))
date_Grammar = ZeroOrMore('/*') + ZeroOrMore('*') + ZeroOrMore('--')+ CaselessKeyword('$Date:') + Word( alphanums+"/"+alphanums+"/") + CaselessKeyword('$') + LineStart()
date_Grammar.setParseAction(replaceWith(""))
revision_Grammar = ZeroOrMore('/*') + ZeroOrMore('*') + ZeroOrMore('--')+ CaselessKeyword('$Revision:') + Word( '#'+alphanums) + CaselessKeyword('$') + LineStart()
revision_Grammar.setParseAction(replaceWith(""))
change_Grammar = ZeroOrMore('/*') + ZeroOrMore('*') + ZeroOrMore('--')+ CaselessKeyword('$Change:') + Word(alphanums) + CaselessKeyword('$') + LineStart()
change_Grammar.setParseAction(replaceWith(""))
dateTime_Grammar = ZeroOrMore('/*') + ZeroOrMore('*') + ZeroOrMore('--')+ CaselessKeyword('$Date:') + Word( alphanums+"/"+alphanums+"/") + Word(alphanums+":"+alphanums+":"+alphanums) + CaselessKeyword('$') + LineStart()
dateTime_Grammar.setParseAction(replaceWith(""))
header_Grammar = ZeroOrMore('/*') + ZeroOrMore('*') + ZeroOrMore('--')+ CaselessKeyword('$Header:') + Word( "//"+alphanums+"/"+alphas+"_"+alphas+".", alphanums+"_"+alphas+".") + CaselessKeyword('$') + LineStart()
header_Grammar.setParseAction( replaceWith("") )
postStripFile = author_Grammar.transformString(header_Grammar.transformString(dateTime_Grammar.transformString(change_Grammar.transformString(revision_Grammar.transformString(date_Grammar.transformString(Workfile_Grammar.transformString(original_code)))))))
# Is there a way to check the transFormStrings have found and removed any Grammar (RCS keywords?)
print(postStripFile)
# this is where we write postStripFile back to the original file name
# so that the files with RCS tags are stripped in place and the ones without are left in place without changes.
答案 0 :(得分:1)
最简单的方法是只在调用transformString
之前和之后比较字符串,如果不同,则写出该文件。
# combine all transformers into a single parser, so transform can be done in
# one pass
parser = (Workfile_Grammar
| date_grammar
| revision_grammar
| change_grammar
| dateTime_grammar
| header_grammar
| author_grammar
)
new_sql = parser.transformString(original_sql)
if new_sql != original_sql:
# do whatever when detecting original has been transformed
稍微高效一点的可能是向所有将全局变量设置为True的表达式添加另一个解析操作:
changed = False
def changes_made():
global changed
changed = True
Workfile_Grammar.setParseAction(changes_made, replaceWith(""))
...
changed = False
new_sql = parser.transformString(original_sql)
if changed:
# ... etc. ...
setParseAction
将接受成功解析后要调用的多个函数。由于changes_made
不会对已解析的令牌进行任何修改,因此就pyparsing而言,它只是一个传递。
在同一运行中多次调用changes_made
之前,必须确保将transformString
重置为False。
我个人的偏爱是比较简单的第一种方法。