假设有一个sql脚本:
select *
from (
select col1 from test.test_a join test.test_a1 on a.col1 = a1.col1) a
left join test.test_b b
on a.col1 = b.col2
left join
test.test_c c
on b.col2 = c.col3
left jon
(select
col4
from
test.test_d) d
on c.col3 = d.col4
我正在阅读this question并尝试使用python在上面的脚本中提取“from
”或“join
”之后的所有TABLE NAMES。
困难是我逐行处理脚本,但表名和关键字可能在同一行中 NOT 。
那么如何从脚本中提取这样的表名呢?任何建议都表示赞赏。
答案 0 :(得分:2)
如果你想使用core python:
txt = """
select *
from (
select col1 from test.test_a join test.test_a1 on a.col1 = a1.col1) a
left join test.test_b b
on a.col1 = b.col2
left join
test.test_c c
on b.col2 = c.col3
left jon
(select
col4
from
test.test_d) d
on c.col3 = d.col4"""
replace_list = ['\n', '(', ')', '*', '=']
for i in replace_list:
txt = txt.replace(i, ' ')
txt = txt.split()
res = []
for i in range(1, len(txt)):
if txt[i-1] in ['from', 'join'] and txt[i] != 'select':
res.append(txt[i])
print(res)
答案 1 :(得分:1)
只需将所有多个空格序列(包括换行符)转换为单个空格,而不是单行,并且cat使用正则表达式查找表名。
import re
sql = """select *
from (
select col1 from test.test_a join test.test_a1 on a.col1 = a1.col1) a
left join test.test_b b
on a.col1 = b.col2
left join
test.test_c c
on b.col2 = c.col3
left join
(select
col4
from
test.test_d) d
on c.col3 = d.col4"""
sql_line = re.sub('\s+', ' ', sql)
tbl_re = re.compile(r'(?:\b(?:from)|(?:join)\b\s+)(\w+)\b')
tablenames = tbl_re.findall(sql_line)
print(tablenames)
请注意,表名提取正则表达式是简化的,仅作为示例(您必须考虑可能的引用等)。
答案 2 :(得分:0)
这是对@ r.user.05apr答案的快速改进。结合来自https://stackoverflow.com/a/46177004/82961
的一些位import re
txt = """
select *
from (
select col1 from test.test_a join test.test_a1 on a.col1 = a1.col1) a
left join test.test_b b
on a.col1 = b.col2
left join
test.test_c c -- from xxx
on b.col2 = c.col3 /* join xxxxx */
left jon
(select
col4
from
test.test_d) d
on c.col3 = d.col4"""
def get_tables(sql_str):
# remove the /* */ comments
sql_str = re.sub(r"/\*[^*]*\*+(?:[^*/][^*]*\*+)*/", "", sql_str)
# remove whole line -- and # comments
lines = [line for line in sql_str.splitlines() if not re.match("^\s*(--|#)", line)]
# remove trailing -- and # comments
sql_str = " ".join([re.split("--|#", line)[0] for line in lines])
replace_list = ['\n', '(', ')', '*', '=']
for i in replace_list:
sql_str = sql_str.replace(i, ' ')
sql_str = sql_str.split()
res = []
for i in range(1, len(sql_str)):
if sql_str[i-1] in ['from', 'join'] and sql_str[i] != 'select':
res.append(sql_str[i])
print(res)
get_tables(txt)