解析sql select语句以获取python中的where子句条件

时间:2019-03-28 12:54:23

标签: python mysql parsing dictionary sql-parser

我有一个SQL查询,我想将where子句中的所有条件提取到Python字典中。

例如

import sqlparse

s = "select count(*) from users where employee_type = 'Employee' AND (employment_status = 'Active' OR employment_status = 'On Leave') AND (time_type='Full time' OR country_code <> 'US') AND hire_date < NOW() AND email_work IS NOT NULL AND LENGTH(email_work) > 0 AND NOT (job_profile_id IN ('8802 - Comm Ops - 1', '8801 - CityOps - 2', '10034', '10455', '21014', '21015', '21016', '21018', '21017', '21019') AND country_code = 'IE') AND job_profile_id NOT IN ('20992', '20993', '20994', '20995', '20996', '20997') AND country_code NOT IN ('CN', 'MO', 'SG', 'MY', 'TH', 'VN', 'MM', 'KH', 'PH', 'ID')"

parsed = sqlparse.parse(s)
where = parsed[0][-1]

sql_tokens = []
def get_tokens(where):
    for i in where.tokens:
        try:
            name = i.get_real_name()
            if name and not isinstance(i, sqlparse.sql.Parenthesis):
                # sql_tokens.append("{0} - {1} - {2}".format(str(i), str(name), i.value))
                sql_tokens.append({
                    'key': str(name),
                    'value': i.value,
                })
            else:
                get_tokens(i)
        except Exception as e:
            pass


get_tokens(where)
for i in sql_tokens:
    print i

以下是输出

{'value': u"employee_type = 'Employee'", 'key': 'employee_type'}
{'value': u"employment_status = 'Active'", 'key': 'employment_status'}
{'value': u"employment_status = 'On Leave'", 'key': 'employment_status'}
{'value': u"time_type='Full time'", 'key': 'time_type'}
{'value': u"country_code <> 'US'", 'key': 'country_code'}
{'value': u'hire_date < NOW()', 'key': 'hire_date'}
{'value': u'email_work', 'key': 'email_work'}
{'value': u'LENGTH(email_work) > 0', 'key': 'LENGTH'}
{'value': u'job_profile_id', 'key': 'job_profile_id'}
{'value': u"country_code = 'IE'", 'key': 'country_code'}
{'value': u'job_profile_id', 'key': 'job_profile_id'}
{'value': u'country_code', 'key': 'country_code'}

这里的问题是IN运算符。 选中job_profile_id,其中不包含列表。

在调试时,它不会显示列表。

我无法解决此问题。

请帮助。

感谢帮助。

1 个答案:

答案 0 :(得分:1)

这是因为IN关键字和比较的树结构不同。例如,比较将整个表达式包括在树中它下方。

如果使用parsed[0]._pprint_tree(),则可以看到嵌套在“比较”节点下的所有内容:

   |- 2 Comparison 'employ...'
   |  |- 0 Identifier 'employ...'
   |  |  `- 0 Name 'employ...'
   |  |- 1 Whitespace ' '
   |  |- 2 Comparison '='
   |  |- 3 Whitespace ' '
   |  `- 4 Single ''Emplo...'

但是,NOT IN子句是一系列顺序节点:

   |- 36 Identifier 'job_pr...'
   |  `- 0 Name 'job_pr...'
   |- 37 Whitespace ' '
   |- 38 Keyword 'NOT'
   |- 39 Whitespace ' '
   |- 40 Keyword 'IN'
   |- 41 Whitespace ' '
   |- 42 Parenthesis '('2099...'
   |  |- 0 Punctuation '('
   |  |- 1 IdentifierList ''20992...'
   |  |  |- 0 Single "'20992'"
   |  |  |- 1 Punctuation ','
   |  |  |- 2 Whitespace ' '
   |  |  |- 3 Single "'20993'"
   |  |  |- 4 Punctuation ','
   |  |  |- 5 Whitespace ' '
   |  |  |- 6 Single "'20994'"
   |  |  |- 7 Punctuation ','
   |  |  |- 8 Whitespace ' '
   |  |  |- 9 Single "'20995'"
   |  |  |- 10 Punctuation ','
   |  |  |- 11 Whitespace ' '
   |  |  |- 12 Single "'20996'"
   |  |  |- 13 Punctuation ','
   |  |  |- 14 Whitespace ' '
   |  |  `- 15 Single "'20997'"
   |  `- 2 Punctuation ')'

您最好的选择是观察标识符,然后跳转并保存下一个括号节点的值。虽然这不能解决所有可能的情况,但可以处理您的SQL语句并返回job_profile_id的值。

这是我修改的代码:

import sqlparse

s = "select count(*) from users where employee_type = 'Employee' AND (employment_status = 'Active' OR employment_status = 'On Leave') AND (time_type='Full time' OR country_code <> 'US') AND hire_date < NOW() AND email_work IS NOT NULL AND LENGTH(email_work) > 0 AND NOT (job_profile_id IN ('8802 - Comm Ops - 1', '8801 - CityOps - 2', '10034', '10455', '21014', '21015', '21016', '21018', '21017', '21019') AND country_code = 'IE') AND job_profile_id NOT IN ('20992', '20993', '20994', '20995', '20996', '20997') AND country_code NOT IN ('CN', 'MO', 'SG', 'MY', 'TH', 'VN', 'MM', 'KH', 'PH', 'ID')"

parsed = sqlparse.parse(s)
where = parsed[0][-1]

sql_tokens = []
def get_tokens(where):
    identifier = None
    for i in where.tokens:
        try:
            name = i.get_real_name()
            if name and isinstance(i, sqlparse.sql.Identifier):
                identifier = i
            elif identifier and isinstance(i, sqlparse.sql.Parenthesis):
                sql_tokens.append({
                    'key': str(identifier),
                    'value': token.value
                })
            elif name:
                identifier = None
                # sql_tokens.append("{0} - {1} - {2}".format(str(i), str(name), i.value))
                sql_tokens.append({
                    'key': str(name),
                    'value': u''.join(token.value for token in i.flatten()),
                })
            else:
                get_tokens(i)
        except Exception as e:
            pass

get_tokens(where)
print sql_tokens