期望的结果:
我想要一个解析器函数,它需要一串"指令"。
此字符串将使用string.split(";")
进行切割并删除空格。我想检查一下" chop"用于匹配一堆(10+)正则表达式。每个表达式还具有捕获组定义的值,我稍后将使用这些值来执行命令"。
问题:
我目前有一个冗长而复杂的if,elseif,else语句,这是非常不受欢迎的,因为它使我的代码更难管理,更难以让其他人阅读。
到目前为止的想法:
基本上,我想使用字典来模拟switch语句。但我对正则表达式的经验很少,我能够做出正确的表达式#34;在"说明"中捕捉我想要的东西。但我对蟒蛇正则表达式包的运作方式非常不熟悉。
正确方向上的一个步骤已经是一个函数,给定一个字符串,一个列表或正则表达式的字典,该函数将返回哪个reg-ex匹配。
示例代码:(请原谅:) :)
class PreparedConstraintsCollection(ConstraintsCollectionABC):
not_pattern = re.compile("^not([-+]*[0-9]+)$")
ex_pattern = re.compile("^ex([-+]*[0-9]+)$")
more_pattern = re.compile("^>([-+]*[0-9]+)$")
less_pattern = re.compile("^<([-+]*[0-9]+)$")
interval_pattern = re.compile("^([-+]*[0-9]+)<x<([-+]*[0-9]+)$")
def parse_constraints_string(self, restriction_string: str) -> set:
"""
The overly-complex function to parse the restriction control sequence strings
Control Sequence Meaning Explanation
---------------------------------------------------------------------------------------------
+ Positive only Allow only positive values
- Negative only Allow only negative values
notX Not X value Do not allow values X
exX Must be X Only allow values X
>X More then X Values must be more then X
<X Less then X Values must be less then X
M<x<N Interval M, N Must be more then M but less then N
:param restriction_string: a string with control sequences
:return: return the gathered restriction instances, conserve only unique
"""
gathered_constraints = set()
for control_seq in restriction_string.split(";"):
stripped = control_seq.strip().replace(" ", "")
if stripped == "":
continue
elif stripped == "+":
gathered_constraints.add(res_gallery.PositiveConstraint())
elif stripped == "-":
gathered_constraints.add(res_gallery.NegativeConstraint())
elif self.not_pattern.match(stripped):
searched = re.search(self.not_pattern, stripped)
param = float(searched.group(1))
gathered_constraints.add(res_gallery.NotExactValueConstraint(param))
elif self.ex_pattern.match(stripped):
searched = re.search(self.ex_pattern, stripped)
param = float(searched.group(1))
gathered_constraints.add(res_gallery.ExactValueConstraint(param))
elif self.more_pattern.match(stripped):
searched = re.search(self.more_pattern, stripped)
param = float(searched.group(1))
gathered_constraints.add(res_gallery.GreaterThanConstraint(param))
elif self.less_pattern.match(stripped):
searched = re.search(self.less_pattern, stripped)
param = float(searched.group(1))
gathered_constraints.add(res_gallery.LessThanConstraint(param))
elif self.interval_pattern.match(stripped):
searched = re.search(self.interval_pattern, stripped)
param1, param2 = float(searched.group(1)), float(searched.group(2))
gathered_constraints.add(res_gallery.IntervalConstraint(param1, param2))
else:
raise ValueError("Restriction string could not be parsed!")
return gathered_constraints
答案 0 :(得分:1)
解析器的一种可能性是编写一个tokenizer,它将创建所有语句的嵌套列表和找到的类型:
第一步是创建语法并标记输入字符串:
import re
import collections
token = collections.namedtuple('token', ['type', 'value'])
grammar = r'\+|\-|\bnot\b|\bex\b|\>|\<|[a-zA-Z0-9_]+'
tokens = {'plus':'\+', 'minus':'\-', 'not':r'\bnot\b', 'ex':r'\bex\b', 'lt':'\<', 'gt':'\>', 'var':'[a-zA-Z0-9_]+'}
sample_input = 'val1+val23; val1 < val3 < new_variable; ex val3;not secondvar;'
tokenized_grammar = [token([a for a, b in tokens.items() if re.findall(b, i)][0], i) for i in re.findall(grammar, sample_input)]
现在,tokenized_grammar
存储了文本中所有标记化语法出现的列表:
[token(type='var', value='val1'), token(type='plus', value='+'), token(type='var', value='val23'), token(type='var', value='val1'), token(type='lt', value='<'), token(type='var', value='val3'), token(type='lt', value='<'), token(type='var', value='new_variable'), token(type='var', value='ex'), token(type='var', value='val3'), token(type='var', value='not'), token(type='var', value='secondvar')]
令牌类型和值可以作为对象访问:
full_types = [(i.type, i.value) for i in tokenized_grammar]
输出:
[('var', 'val1'), ('plus', '+'), ('var', 'val23'), ('var', 'val1'), ('lt', '<'), ('var', 'val3'), ('lt', '<'), ('var', 'new_variable'), ('var', 'ex'), ('var', 'val3'), ('var', 'not'), ('var', 'secondvar')]
要实现switch
- case
语句的流程,您可以创建一个字典,每个键都是一个标记的类型,值是一个存储相应值的类,稍后要添加的方法:
class Plus:
def __init__(self, storing):
self.storing = storing
def __repr__(self):
return "{}({})".format(self.__class__.__name__, self.storing)
class Minus:
def __init__(self, storing):
self.storing = storing
def __repr__(self):
return "{}({})".format(self.__class__.__name__, self.storing)
...
然后,创建字典:
tokens_objects = {'plus':Plus, 'minus':Minus, 'not':Not, 'ex':Ex, 'lt':Lt, 'gt':Lt, 'var':Variable}
然后,您可以迭代tokenized_grammar
,并为每次出现创建一个类对象:
for t in tokenized_grammar:
t_obj = token_objects[t.type](t.value)