我从这样的查询中仅提取了列字段
query_split = [query[query.find("select") + len("select"): query.find("from")]]
我得到这样的字符串
query_split = [' service,count(*) as count,round(sum(mrp),2) as sale ']
我想要一个看起来像这样的列表
[' service','count(*) as count','round(sum(mrp),2) as sale']
这是因为我要获取列名列表
['service','count','sale']
我尝试了其他方法,例如
for file in reader(query_split):
print(file)
给我输出
[' service', 'count(*) as count', 'round(sum(mrp)', '2) as sale ']
当我接受在查询中使用round(sum(mrp),2)类型运算的测试用例时,以下函数此时失败了
def get_column_name(query):
"""
Extracts the column name from a sql query
:param query: str
:return: column_name
list: Column names which that query will fetch
"""
column_name=[]
query_split = query[query.find("select") + len("select"): query.find("from")]
for i in query_split.split(','):
if "as" in i:
column_name.append(i.split('as')[-1])
else:
column_name.append(i.split(' ')[-1])
return column_name
答案 0 :(得分:1)
您的问题是,此处使用的SQL具有嵌套构造。
最可能最干净的解决方案是拥有一个了解MySQL方言的SQL解析器。可以说,使用ANTLR可以最轻松地完成它。如果您感到好奇,可以找到MySQL grammar here和quick guide here。
要使用正则表达式解决此问题,我们需要以匹配模式使用递归正则表达式解决平衡括号,如下所示:
[^,]+(\((?>[^()]++|(?1))*+\))[^,]+|([^(),]+(?:,|$))
说明:
[^,]+(\((?>[^()]++|(?1))*+\))[^,]+
递归正则表达式,以匹配成对的()
以及介于两者之间的所有内容(包括逗号),该否定的字符类与除逗号以外的所有内容相匹配。([^(),]+(?:,|$))
匹配常规列示例代码:
import regex as re
regex = r"[^,]+(\((?>[^()]++|(?1))*+\))[^,]+|([^(),]+(?:,|$))"
test_str = "service,count(*) as count,round(sum(mrp),2) as sale,count(*) as count2,round(sum(mrp),2) as sale2"
matches = re.finditer(regex, test_str, re.MULTILINE)
result = [match.group() for match in matches]
输出:
['service,', 'count(*) as count', 'round(sum(mrp),2) as sale', 'count(*) as count2', 'round(sum(mrp),2) as sale2']
由于我们使用的是PCRE正则表达式功能,因此您需要安装Python的替代regex包才能运行代码。祝你好运。