Question

我有一个IronPython脚本，它针对SQL Server数据库执行一堆SQL语句。语句是大字符串，实际上包含多个语句，由“GO”关键字分隔。当它们从sql管理工作室和其他一些工具运行时可以工作，但不能在ADO中运行。所以我使用2.5“re”模块拆分字符串，如下所示：

splitter = re.compile(r'\bGO\b', re.IGNORECASE)
for script in splitter.split(scriptBlob):
    if(script):
        [... execute the query ...]

在极少数情况下，注释或字符串中出现“go”一词。如何解决这个问题？即正确地将此字符串解析为两个脚本：

-- this is a great database script!  go team go!
INSERT INTO myTable(stringColumn) VALUES ('go away!')
/*
  here are some comments that go with this script.
*/
GO
INSERT INTO myTable(stringColumn) VALUES ('this is the next script')

修改

我搜索了更多并找到了这个SQL文档： http://msdn.microsoft.com/en-us/library/ms188037(SQL.90).aspx

事实证明，GO必须按照自己的方式提出一些答案。然而，它后面可以跟一个“计数”整数，它实际上会多次执行语句批处理（有人之前实际使用过它），然后可以在同一行上单行注释（但不是多行，我测试了这个。）所以神奇的正则表达式看起来像：

"(?m)^\s*GO\s*\d*\s*$"

除此之外不考虑：

最后一条可能的单行评论（"--"后跟除换行符之外的任何字符）。
整条线都在更大的多行注释中。

我并不关心捕获“count”参数并使用它。现在我有一些技术文档，我非常接近写这个“规范”，而且再也不用担心了。

Answer 1

“GO”总是单独排队吗？你可以拆分“^ GO $”。

Answer 2

因为你可以在注释，嵌套注释，查询中的注释等内部发表评论，所以没有理智的方法来使用正则表达式。

只需想象一下以下脚本：

INSERT INTO table (name) VALUES (
-- GO NOW GO
'GO to GO /* GO */ GO' +
/* some comment 'go go go'
-- */ 'GO GO' /*
GO */
)

没有提到：

INSERT INTO table (go) values ('xxx') GO

唯一的方法是构建一个有状态解析器。一次读取一个char，并且当它在注释/引用分隔的字符串/ etc中时将设置一个标志，并在结束时重置，因此代码可以在内部时忽略“GO”实例。 / p>

Answer 3

如果GO总是在一条线上，你可以使用这样的分割：

#!/usr/bin/python

import re

sql = """-- this is a great database script!  go team go!
INSERT INTO myTable(stringColumn) VALUES ('go away!')
/*
  here are some comments that go with this script.
*/
GO 5 --this is a test
INSERT INTO myTable(stringColumn) VALUES ('this is the next script')"""

statements = re.split("(?m)^\s*GO\s*(?:[0-9]+)?\s*(?:--.*)?$", sql)

for statement in statements:
    print "the statement is\n%s\n" % (statement)

(?m)启用了多行匹配，即^和$将匹配行的开头和结尾（而不是字符串的开头和结尾）。
^在一行开头匹配
\s*匹配零个或多个空格（空格，制表符等）
GO与文字GO
\s*与以前匹配
(?:[0-9]+)?匹配可选的整数（可能带前导零）
\s*与以前匹配
(?:--.*)?匹配可选的行尾注释
$在一行末尾匹配

拆分将消耗GO线，因此您不必担心它。这将为您提供一份陈述清单。

这个修改后的拆分有一个问题：在GO之后它不会给你回数，如果这很重要我会说是时候转移到某种形式的解析器了。

Answer 4

~~这不会检测GO是否曾在某个语句中用作变量名，但是应该注意那些内部注释或字符串。~~

编辑：如果GO是声明的一部分，只要它不在自己的行中，现在就可以了。

import re

line_comment = r'(?:--|#).*$'
block_comment = r'/\*[\S\s]*?\*/'
singe_quote_string = r"'(?:\\.|[^'\\])*'"
double_quote_string = r'"(?:\\.|[^"\\])*"'
go_word = r'^[^\S\n]*(?P<GO>GO)[^\S\n]*\d*[^\S\n]*(?:(?:--|#).*)?$'

full_pattern = re.compile(r'|'.join((
    line_comment,
    block_comment,
    singe_quote_string,
    double_quote_string,
    go_word,
)), re.IGNORECASE | re.MULTILINE)

def split_sql_statements(statement_string):
    last_end = 0
    for match in full_pattern.finditer(statement_string):
        if match.group('GO'):
            yield statement_string[last_end:match.start()]
            last_end = match.end()
    yield statement_string[last_end:]

使用示例：

statement_string = r"""
-- this is a great database script!  go team go!
INSERT INTO go(go) VALUES ('go away!')
go 7 -- foo
INSERT INTO go(go) VALUES (
    'I have to GO " with a /* comment to GO inside a /* GO string /*'
)
/*
  here are some comments that go with this script.
  */
  GO
  INSERT INTO go(go) VALUES ('this is the next script')
"""

for statement in split_sql_statements(statement_string):
    print '======='
    print statement

输出：

=======

-- this is a great database script!  go team go!
INSERT INTO go(go) VALUES ('go away!')

=======

INSERT INTO go(go) VALUES (
    'I have to GO " with a /* comment to GO inside a /* GO string /*'
)
/*
  here are some comments that go with this script.
  */

=======

  INSERT INTO go(go) VALUES ('this is the next script')

用于解析SQL语句的正则表达式

4 个答案: