我有一大组SAS方程式语句,并希望使用Python将这些方程式转换为Python语句。 它们如下所示:
来自SAS:
select;
when(X_1 <= 6.7278 ) V_1 =-0.0594 ;
when(X_1 <= 19.5338 ) V_1 =0.0604 ;
when(X_1 <= 45.1458 ) V_1 =0.1755 ;
when(X_1 <= 83.5638 ) V_1 =0.2867 ;
when(X_1 <= 203.0878 ) V_1 =0.395 ;
when(X_1 > 203.0878 ) V_1 =0.5011 ;
end;
label V_1 ="X_1 ";
select;
when(X_2 <= 0.0836 ) V_2 =0.0562 ;
when(X_2 <= 0.1826 ) V_2 =0.07 ;
when(X_2 <= 0.2486 ) V_2 =0.0836 ;
when(X_2 <= 0.3146 ) V_2 =0.0969 ;
when(X_2 <= 0.3806 ) V_2 =0.1095 ;
when(X_2 <= 0.4466 ) V_2 =0.1212 ;
when(X_2 <= 0.5126 ) V_2 =0.132 ;
when(X_2 <= 0.5786 ) V_2 =0.1419 ;
when(X_2 <= 0.6446 ) V_2 =0.1511 ;
when(X_2 <= 0.7106 ) V_2 =0.1596 ;
when(X_2 <= 0.8526 ) V_2 =0.1679 ;
when(X_2 > 0.8526 ) V_2 =0.176 ;
end;
label V_2 ="X_2 ";
...
...
...
到Python:
if X_1 <= 6.7278:
V_1 =-0.0594
elif X_1 <= 19.5338:
V_1 =0.0604
elif X_1 <= 45.1458:
V_1 =0.1755
elif X_1 <= 83.5638:
V_1 =0.2867
elif X_1 <= 203.0878:
V_1 =0.395
else:
V_1 =0.5011
if X_2 <= 0.0836:
....
我不知道从哪里开始,就像使用&#39;&#39;包或其他任何东西。任何帮助都会非常感激!
答案 0 :(得分:3)
如果输入非常一致(如图所示),您可能会使用re
。
对于更复杂的事情,您可能希望查看更强大的解析器,如pyparsing
。
编辑:这是一个使用正则表达式的非常简单的有限状态机解析器;它处理空行,未通过select;
和end;
语句以及初始/后续when
。我不处理label
因为我不确定他们做了什么 - 将V变量重命名为X?
import re
class SasTranslator:
def __init__(self):
# modes:
# 0 not in START..END
# 1 in START..END, no CASE seen yet
# 2 in START..END, CASE already found
self.mode = 0
self.offset = -1 # input line #
def handle_blank(self, match):
return ""
def handle_start(self, match):
if self.mode == 0:
self.mode = 1
return None
else:
raise ValueError("Found 'select;' in select block, line {}".format(self.offset))
def handle_end(self, match):
if self.mode == 0:
raise ValueError("Found 'end;' with no opening 'select;', line {}".format(self.offset))
elif self.mode == 1:
raise ValueError("Found empty 'select;' .. 'end;', line {}".format(self.offset))
elif self.mode == 2:
self.mode = 0
return None
def handle_case(self, match):
if self.mode == 0:
raise ValueError("Found 'when' clause outside 'select;' .. 'end;', line {}".format(self.offset))
elif self.mode == 1:
test = "if"
self.mode = 2
# note: code continues after if..else block
elif self.mode == 2:
test = "elif"
# note: code continues after if..else block
test_var, op, test_val, assign_var, assign_val = match.groups()
return (
"{test} {test_var} {op} {test_val}:\n"
" {assign_var} = {assign_val}".format(
test = test,
test_var = test_var,
op = op,
test_val = test_val,
assign_var = assign_var,
assign_val = assign_val
)
)
#
# Build a dispatch table for the handlers
#
BLANK = re.compile("\s*$")
START = re.compile("select;\s*$")
END = re.compile("end;\s*$")
CASE = re.compile("\s*when\((\w+)\s*([<>=]+)\s*([\d.-]+)\s*\)\s*(\w+)\s*=\s*([\d.-]+)\s*;\s*$")
dispatch_table = [
(BLANK, handle_blank),
(START, handle_start),
(END, handle_end),
(CASE, handle_case)
]
def __call__(self, line):
"""
Translate a single line of input
"""
self.offset += 1
for test,handler in SasTranslator.dispatch_table:
match = test.match(line)
if match is not None:
return handler(self, match)
# nothing matched!
return None
def main():
with open("my_file.sas") as inf:
trans = SasTranslator()
for line in inf:
result = trans(line)
if result is not None:
print(result)
else:
print("***unknown*** {}".format(line.rstrip()))
if __name__=="__main__":
main()
并针对您生成的样本输入运行
if X_1 <= 6.7278:
V_1 = -0.0594
elif X_1 <= 19.5338:
V_1 = 0.0604
elif X_1 <= 45.1458:
V_1 = 0.1755
elif X_1 <= 83.5638:
V_1 = 0.2867
elif X_1 <= 203.0878:
V_1 = 0.395
elif X_1 > 203.0878:
V_1 = 0.5011
***unknown*** label V_1 ="X_1 ";
if X_2 <= 0.0836:
V_2 = 0.0562
elif X_2 <= 0.1826:
V_2 = 0.07
elif X_2 <= 0.2486:
V_2 = 0.0836
elif X_2 <= 0.3146:
V_2 = 0.0969
elif X_2 <= 0.3806:
V_2 = 0.1095
elif X_2 <= 0.4466:
V_2 = 0.1212
elif X_2 <= 0.5126:
V_2 = 0.132
elif X_2 <= 0.5786:
V_2 = 0.1419
elif X_2 <= 0.6446:
V_2 = 0.1511
elif X_2 <= 0.7106:
V_2 = 0.1596
elif X_2 <= 0.8526:
V_2 = 0.1679
elif X_2 > 0.8526:
V_2 = 0.176
***unknown*** label V_2 ="X_2 ";
根据您使用此频率的频率,使用bisect
并将select;
.. end;
块转换为该表单可能值得进行二项查找功能(尽管您可以我要非常小心,比较运算符是你所期望的!) - 类似
V_1 = index_into(
X_1,
[ 6.7278, 19.5338, 45.1458, 83.5638, 203.0878 ],
[-0.0594, 0.0604, 0.1755, 0.2867, 0.395, 0.5011]
)
它可以明显更快地运行(特别是随着选项数量的增加)并且更容易理解和维护。