用逗号解析字符串以决定

时间:2018-12-21 19:50:42

标签: regex python-3.x pandas numpy

我有一个输入字符串,如下所示。我想根据逗号将其解析为dict,如下所示。问题在于,有时括号内包含逗号,如以下示例所示,并且引号内也有引号。我不太喜欢使用正则表达式匹配,因此非常感谢所有提示。

输入:

"ty_event_name, from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd') as ty_date,'${hiveconf:run_dt}' as sessions_fy,orders_xy"

输出:

{1:'ty_event_name',
2:'from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd') as ty_date',
3:''${hiveconf:run_dt}' as sessions_fy',
4:'orders_xy'}

尝试:

import pandas as pd
import numpy as np
import re

teststr="ty_event_name, from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd') as ty_date,'${hiveconf:run_dt}' as sessions_fy,orders_xy"

tstr=re.sub('(?!\B"[^"]*),(?![^"]*"\B)',',',teststr).split()

tstr

输出:

['ty_event_name,',
 "from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd')",
 'as',
 "ty_date,'${hiveconf:run_dt}'",
 'as',
 'sessions_fy,orders_xy']

1 个答案:

答案 0 :(得分:0)

这看起来很成功:

代码:

re.split(r',\s*(?=[^)]*(?:\(|$))', teststr) 

输出:

['ty_event_name',
 "from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd') as ty_date",
 "'${hiveconf:run_dt}' as sessions_fy",
 'orders_xy']