Question

我正在尝试解析一堆SQL脚本，以找出他们从中读取和写入的所有表。

到目前为止，我加载文件，将其拆分为查询，并设法解析一些事情。

createproc = re.compile(r"""\s*create procedure (?P<procedurename>[a-zA-Z_0-9.]+)\s*""")
droptable = re.compile(r"""\s*drop table (if exists)* (?P<tablename>[a-zA-Z_0-9.]+)\s*""")
createtable = re.compile(r"""\s*create table (if not exists)* (?P<tablename>[a-zA-Z_0-9.]+)\s*""")
createindex = re.compile(r"""\s*create index [a-zA-Z_0-9. ]*on (?P<tablename>[a-zA-Z_0-9.]+)\s*""")
altertable = re.compile(r"""\s*alter table (?P<tablename>[a-zA-Z_0-9.]+)\s*""")
inserttable = re.compile(r"""\s*insert into (?P<tablename>[a-zA-Z_0-9.]+)\s*""")
updatetable = re.compile(r"""\s*update (?P<tablename>.*?)\s* set""")
deletetable = re.compile(r"""delete.*from[\r\n\s]*(?P<tablename>.*?)[\r\n\s]+""")

虽然我确信所有这些regexp都不尽如人意，但特别是最后一个让我头疼。我有一个测试字符串：

teststring = 'delete from my_db.my_table \r\n where\r\n(my_column >= 5/2 or my_column is null);'

并尝试解析它：

match =deletetable.search(teststring,re.MULTILINE|re.DOTALL)
if match:
    print(match.group("tablename"))

我空了。我尝试了几件事，但到目前为止没有任何帮助。

遗憾的是，SQL脚本在换行符，空格和缩进方面非常不一致，所以我必须考虑所有可能性

Answer 1

不要将flags传递给search，您需要在compile来电中提供这些内容。正则表达式对象的search方法期望起始位置作为第二个参数而不是标志。

import re
deletetable = re.compile(r'delete(?:\s+\w+)*?\s+from\s+(?P<tablename>[\w.]+)', 
           re.MULTILINE | re.DOTALL)
teststring = 'delete from my_db.my_table \r\n where\r\n(my_column >= 5/2 or my_column is null);'
match =deletetable.search(teststring)
if match:
    print(match.group("tablename"))

Answer 2

我认为问题在于使用re.MULTILINE和re.DOTALL。我尝试了没有它们的测试，但它确实有效。

Answer 3

这项工作：

deletetable2 = re.compile(r"""delete\s*from\s*(?P<tablename>\S*?)\s+.*""")
teststring = 'delete from my_db.my_table \r\n where\r\n(my_column >= 5/2 or my_column is null);'
print deletetable2.search(teststring).groups() # ('my_db.my_table',)
print deletetable2.search(teststring).group("tablename") # my_db.my_table

我认为你必须简化你的表达并用点完成，而不是指定标志

在python中使用regex解析mySQL代码

3 个答案: