我从Python开始,为了设计我要验证一个必须具有这种格式的字符串:
... AAA一个 AAA ... A(BBB ... B) AAA ... A(BBB ... B)CCC。C AAA ...一个(BBB ... B)CCC ... C(DDD ... d)
其中aaa..a,bbb ... b,ccc..c,ddd..d是整数。
字符串的长度应该是任意的。
字符串中没有空格。
只允许使用一个支架。
我已经将这个问题作为一个有两个状态的有限状态机来解决。
我想知道是否有最佳方法来解决此任务以及您对此的印象以及您的每一个提示。
就像边信息一样,我通过regexp进行了一些测试,但这似乎是一个递归模式验证问题,我不确定在Python中可以轻松做到,但我不是regexp的专家,但我想如果这个任务应该可行,可以用一行代码执行。
我可以通过fsm方法看到的主要优点是通知用户输入字符串中存在错误的位置,然后更容易(从用户的角度来看)检查和更正任务。
[编辑]我发现了一个错误的检测行为,现在代码被纠正了,不允许两个连续的支架组,例如10(200)(300)。 此外,我已将代码重新格式化为函数。
""" String parser for string formatted as reported below: aaa...a aaa...a(bbb...b) aaa...a(bbb...b)ccc...c(ddd...d) where: aaa...a, bbb...b = integer number Not valid (some example) () (aaa...a) aaa...a() aaa...a(bbb...b)ccc...d aaa...a((bbb....b)) """ import sys import re def parse_string(buffer): # Checking loop state = 1 old_state = 1 next_state = 1 strlen = len(buffer) initial = True success = False is_a_number = re.compile("[0-9]") for index, i in enumerate(buffer): car = i # State 1 if (state == 1): if is_a_number.match(car): if (index != strlen-1): # If is a number e not the last I've to wait for the next char "(" or number next_state = 1 else: if (initial): # If is a number and is also the last of the initial block -> I've finish to parse success = True break else: # Is the last number but not into the initial block of numbers -> error success = False break else: if (car == "("): if (old_state == 2): # Can't have two (...)(...) consecutively success = False break if ((index == 0) or (index == strlen-1)): # The ( can't be the first or the last char success = False break else: # Step to the next state next_state = 2 initial = False else: # Wrong char detected success = False break if (state == 2): if is_a_number.match(car): if (index != strlen-1): # The char is a number and is not the last of the string next_state = 2 else: # If is a number and is also the last I've a error due to a missing ")" success = False break else: if (car == ")"): if (old_state == 1): # The sequence () is not allowed success = False break elif ((old_state == 2) and (index != strlen-1)): # The previous char was a number next_state = 1 else: # I'm on the last char of the string success = True break else: # Wrong char detected success = False break print("current state: "+ str(state) + " next_state: " + str(next_state)) # Update the old and the new state old_state = state state = next_state return(success, state, index) if __name__ == "__main__": # Get the string from the command line # The first argument (index = 0) is the script name, the supplied parameters start from the idex = 1 number_cmd = len(sys.argv) - 1 if (number_cmd != 1): print ("Error: request one string as input!") sys.exit(0) # Get the string buffer = sys.argv[1].strip() print("================================") print("Parsing: " + buffer) print("Checking with fsm") print("--------------------------------") # Parse the string success, state, index = parse_string(buffer) # Check result if (success): print("String validated!") print("================================") else: print("Syntax error detected in state: " + str(state) + "\n" + "position: " + str(buffer[:index+1])) print("================================") # Exit from script sys.exit(0)
答案 0 :(得分:2)
有限状态机和正则表达式在表达能力上是等价的。它们都可以用来解析regular languages。因此,如果您的问题可以通过FSM解决,也可以使用正则表达式解决。
如果允许递归括号,如1(123(345)12)
,则它不是常规语言,FSM和正则表达式都不能解析字符串。但是根据你的描述和脚本,我猜不允许使用递归括号。正则表达式可以工作。
您的要求:
要获得错误的精确位置,您不能使用一个正则表达式来匹配整个字符串。您可以使用正则表达式\(|\)
拆分字符串,使用[0-9]+
匹配每个细分。然后,您只需要确保括号匹配。
这是我的剧本:
import re
def parse_input(s):
s = s.strip()
digits = re.compile("[0-9]+")
segments = re.split("(\(|\))",s)
if not segments:
print "Error: blank input"
return False
if not segments[0]: # opens with parentheses
print "Error: cannot open with parenthese"
return False
in_p = False
def get_error_context(i):
prefix = segments[i-1] if i>0 else ""
suffix = segments[i+1] if i<len(segments)-1 else ""
return prefix + segments[i] + suffix
for i, segment in enumerate(segments):
if not segment: # blank is not allowed within parentheses
if in_p:
print "Error: empty parentheses not allowed, around '%s'"%get_error_context(i)
return False
else:
print "Error: no digits between ) and (, around '%s'"%get_error_context(i)
return False
elif segment == "(":
if in_p:
print "Error: recursive () not allowed, around '%s'"%get_error_context(i)
return False
else:
in_p = True
elif segment == ")":
if in_p:
in_p = False
else:
print "Error: ) with no matching (, around '%s'"%get_error_context(i)
return False
elif not digits.match(segment):
print "Error: non digits, around '%s'"%get_error_context(i)
return False
if in_p:
print "Error: input ends with an open parenthese, around '%s'"%get_error_context(i)
return False
return True
测试:
>>> parse_input("12(345435)4332(34)")
True
>>> parse_input("(345435)4332(34)")
Error: cannot open with parenthese
False
>>> parse_input("sdf(345435)4332()")
Error: non digits, around 'sdf('
False
>>> parse_input("123(345435)4332()")
Error: empty parentheses not allowed, around '()'
False
>>> parse_input("34324(345435)(34)")
Error: no digits between ) and (, around ')('
False
>>> parse_input("123(344332()")
Error: recursive () not allowed, around '344332('
False
>>> parse_input("12)3(3443)32(123")
Error: ) with no matching (, around '12)3'
False
>>> parse_input("123(3443)32(123")
Error: input ends with an open parenthese, around '(123'
False
答案 1 :(得分:0)
这可以通过正则表达式完成。这是python中的一个例子,你也可以试试regex101:
正则表达式:(\d+)(\(\d+\)(\d+(\(\d+\))?)?)?
这将是python代码:
import re
p = re.compile(ur'(\d+)(\(\d+\)(\d+(\(\d+\))?)?)?')
test_str = u"1000(20)30(345)"
re.match(p, test_str)
如果您想在输入1000(20)30(345)
之后进行检查
您可以在正则表达式之前添加^
,在结尾处添加$
。