我正在尝试解析一些SQL语句。这是一个示例:
select
ms.member_sk a,
dd.date_sk b,
st.subscription_type,
(SELECT foo FROM zoo) e
from dim_member_subscription_all p,
dim_subs_type
where a in (select moo from t10)
我有兴趣在此时获取表格。所以我想看看 [zoo,dim_member_subscription_all,dim_subs_type]& [T10]
我已经整理了一个小脚本,看着Paul McGuire的例子
#!/usr/bin/env python
import sys
import pprint
from pyparsing import *
pp = pprint.PrettyPrinter(indent=4)
semicolon = Combine(Literal(';') + lineEnd)
comma = Literal(',')
lparen = Literal('(')
rparen = Literal(')')
update_kw, volatile_kw, create_kw, table_kw, as_kw, from_kw, \
where_kw, join_kw, left_kw, right_kw, cross_kw, outer_kw, \
on_kw , insert_kw , into_kw= \
map(lambda x: Keyword(x, caseless=True), \
['UPDATE', 'VOLATILE', 'CREATE', 'TABLE', 'AS', 'FROM',
'WHERE', 'JOIN' , 'LEFT', 'RIGHT' , \
'CROSS', 'OUTER', 'ON', 'INSERT', 'INTO'])
select_kw = Keyword('SELECT', caseless=True) | Keyword('SEL' , caseless=True)
reserved_words = (update_kw | volatile_kw | create_kw | table_kw | as_kw |
select_kw | from_kw | where_kw | join_kw |
left_kw | right_kw | cross_kw | on_kw | insert_kw |
into_kw)
ident = ~reserved_words + Word(alphas, alphanums + '_')
table = Combine(Optional(ident + Literal('.')) + ident)
column = Combine(Optional(ident + Literal('.')) + (ident | Literal('*')))
column_alias = Optional(Optional(as_kw).suppress() + ident)
table_alias = Optional(Optional(as_kw).suppress() + ident).suppress()
select_stmt = Forward()
nested_table = lparen.suppress() + select_stmt + rparen.suppress() + table_alias
table_list = delimitedList((nested_table | table) + table_alias)
column_list = delimitedList((nested_table | column) + column_alias)
txt = """
select
ms.member_sk a,
dd.date_sk b,
st.subscription_type,
(SELECT foo FROM zoo) e
from dim_member_subscription_all p,
dim_subs_type
where a in (select moo from t10)
"""
select_stmt << select_kw.suppress() + column_list + from_kw.suppress() + \
table_list.setResultsName('tables', listAllMatches=True)
print txt
for token in select_stmt.searchString(txt):
pp.pprint(token.asDict())
我得到以下嵌套输出。谁能帮助我理解我做错了什么?
{ 'tables': ([(['zoo'], {}), (['dim_member_subscription_all', 'dim_subs_type'], {})], {})}
{ 'tables': ([(['t10'], {})], {})}
答案 0 :(得分:2)
searchString
会返回所有匹配ParseResults
的列表 - 您可以看到每个匹配的tables
值:
for token in select_stmt.searchString(txt):
print token.tables
,并提供:
[['zoo'], ['dim_member_subscription_all', 'dim_subs_type']]
[['t10']]
所以searchString发现了两个SELECT语句。
最新版本的pyparsing支持使用Python内置sum
将此列表汇总为单个整合。访问此合并结果的tables
值如下所示:
print sum(select_stmt.searchString(txt)).tables
[['zoo'], ['dim_member_subscription_all', 'dim_subs_type'], ['t10']]
我认为解析器正在做你想做的所有事情,你只需要弄清楚如何处理返回的结果。
为了进一步调试,您应该开始在ParseResults上使用dump
方法来查看您将获得的内容,这将打印返回的令牌的嵌套列表,然后是所有命名结果的分层树。以你的例子:
for token in select_stmt.searchString(txt):
print token.dump()
print
打印:
['ms.member_sk', 'a', 'dd.date_sk', 'b', 'st.subscription_type', 'foo', 'zoo', 'dim_member_subscription_all', 'dim_subs_type']
- tables: [['zoo'], ['dim_member_subscription_all', 'dim_subs_type']]
['moo', 't10']
- tables: [['t10']]