我有一个输入字符串,如下所示。我想根据逗号将其解析为dict,如下所示。问题在于,有时括号内包含逗号,如以下示例所示,并且引号内也有引号。我不太喜欢使用正则表达式匹配,因此非常感谢所有提示。
输入:
"ty_event_name, from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd') as ty_date,'${hiveconf:run_dt}' as sessions_fy,orders_xy"
输出:
{1:'ty_event_name',
2:'from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd') as ty_date',
3:''${hiveconf:run_dt}' as sessions_fy',
4:'orders_xy'}
尝试:
import pandas as pd
import numpy as np
import re
teststr="ty_event_name, from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd') as ty_date,'${hiveconf:run_dt}' as sessions_fy,orders_xy"
tstr=re.sub('(?!\B"[^"]*),(?![^"]*"\B)',',',teststr).split()
tstr
输出:
['ty_event_name,',
"from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd')",
'as',
"ty_date,'${hiveconf:run_dt}'",
'as',
'sessions_fy,orders_xy']
答案 0 :(得分:0)
这看起来很成功:
代码:
re.split(r',\s*(?=[^)]*(?:\(|$))', teststr)
输出:
['ty_event_name',
"from_unixtime(unix_timestamp(regexp_replace(ty_date,'/','-'),'MM-dd-yyyy'),'yyyy-MM-dd') as ty_date",
"'${hiveconf:run_dt}' as sessions_fy",
'orders_xy']