如何在SQL脚本中提取表名?

时间:2018-04-11 10:46:21

标签: python parsing

假设有一个sql脚本:

select *
from (
  select col1 from test.test_a join test.test_a1 on a.col1 = a1.col1) a
left join test.test_b b 
on a.col1 = b.col2
left join
    test.test_c c
on b.col2  = c.col3
left jon
   (select 
       col4 
    from
       test.test_d) d
on c.col3  = d.col4

我正在阅读this question并尝试使用python在上面的脚本中提取“from”或“join”之后的所有TABLE NAMES。 困难是我逐行处理脚本,但表名和关键字可能在同一行中 NOT

那么如何从脚本中提取这样的表名呢?任何建议都表示赞赏。

3 个答案:

答案 0 :(得分:2)

如果你想使用core python:

txt = """
select *
from (
  select col1 from test.test_a join test.test_a1 on a.col1 = a1.col1) a
left join test.test_b b 
on a.col1 = b.col2
left join
    test.test_c c
on b.col2  = c.col3
left jon
   (select 
       col4 
    from
       test.test_d) d
on c.col3  = d.col4"""

replace_list = ['\n', '(', ')', '*', '=']
for i in replace_list:
    txt = txt.replace(i, ' ')
txt = txt.split()
res = []
for i in range(1, len(txt)):
    if txt[i-1] in ['from', 'join'] and txt[i] != 'select': 
        res.append(txt[i])
print(res)

答案 1 :(得分:1)

只需将所有多个空格序列(包括换行符)转换为单个空格,而不是单行,并且cat使用正则表达式查找表名。

import re
sql = """select *
from (
  select col1 from test.test_a join test.test_a1 on a.col1 = a1.col1) a
left join test.test_b b 
on a.col1 = b.col2
left join
    test.test_c c
on b.col2  = c.col3
left join
   (select 
       col4 
    from
       test.test_d) d
on c.col3  = d.col4"""
sql_line = re.sub('\s+', ' ', sql)
tbl_re = re.compile(r'(?:\b(?:from)|(?:join)\b\s+)(\w+)\b')
tablenames = tbl_re.findall(sql_line)
print(tablenames)

请注意,表名提取正则表达式是简化的,仅作为示例(您必须考虑可能的引用等)。

答案 2 :(得分:0)

这是对@ r.user.05apr答案的快速改进。结合来自https://stackoverflow.com/a/46177004/82961

的一些位
import re

txt = """
select *
from (
  select col1 from  test.test_a join test.test_a1 on a.col1 = a1.col1) a
left join test.test_b b 
on a.col1 = b.col2
left join
    test.test_c c -- from xxx
on b.col2  = c.col3 /* join xxxxx */
left jon
   (select 
       col4 
    from
       test.test_d) d
on c.col3  = d.col4"""

def get_tables(sql_str):
    # remove the /* */ comments
    sql_str = re.sub(r"/\*[^*]*\*+(?:[^*/][^*]*\*+)*/", "", sql_str)

    # remove whole line -- and # comments
    lines = [line for line in sql_str.splitlines() if not re.match("^\s*(--|#)", line)]

    # remove trailing -- and # comments
    sql_str = " ".join([re.split("--|#", line)[0] for line in lines])

    replace_list = ['\n', '(', ')', '*', '=']
    for i in replace_list:
        sql_str = sql_str.replace(i, ' ')
    sql_str = sql_str.split()
    res = []
    for i in range(1, len(sql_str)):
        if sql_str[i-1] in ['from', 'join'] and sql_str[i] != 'select': 
            res.append(sql_str[i])
    print(res)
    
get_tables(txt)