Question

我需要使用Teradata存储过程并检索其中使用的各种对象（表，视图，过程）。为此，我需要编写一个解析器来解析不同的SQL查询，例如SELECT，MERGE，UPDATE等。不鼓励使用第三方库。

我以前从未实现过解析器，因此我想就如何最好地实现SQL解析器提供一些建议。

我正在浏览一些现有的stackoverflow链接，并且大多数情况下建议使用pyparsing或Python-sqlparse。
我可以仅使用正则表达式编写解析器吗？我没有足够的信心，因为有很多方法可以编写查询，并且正则表达式可以处理所有情况吗？
如果pyparsing是最优选的解决方案，那么在其中理解和定义sql语法有多复杂？

我从一个站点获取了参考，并编写了以下代码来解析SELECT查询

def tables_in_sel_query(sql_str):

    # Comma shall be prefixed and suffixed with a space
    sql_str = re.sub(r'\s*?,'," , ",sql_str,re.I|re.S)

    #Remove whitespaces after .
    sql_str =  re.sub(r'(.+?)\.\s+?',r'\1.',sql_str,re.I)

    # remove the /* */ comments
    q = re.sub(r"/\*[^*]*\*+(?:[^*/][^*]*\*+)*/", "", sql_str)

    # split on blanks, parens and semicolons
    tokens = re.split(r"[\s)(;]+", q)


    # scan the tokens. if we see a FROM or JOIN, we set the get_next
    # flag, and grab the next one (unless it's another keyword).

    result = []
    get_next = False
    for tok in tokens:
        if get_next:
            if tok.lower() not in 
["","select","order","group","where","inner","left","right","on"]:
                if (tok != ",") and tok.count(".") > 0:
                    result.append(tok.strip(","))
            else:
                get_next = False
            continue
        get_next = tok.lower() in ["from", "join"]

    return result

尽管工作正常，但我需要为每种查询类型实现此类功能。有人可以建议最好的方法吗？

使用Python从存储过程中解析对象名称

0 个答案: