我可以使用一些对熟悉的人来说很容易的东西。我试图将更多/更少的店铺酿造配置文件解析为字典/ json。我有一些python代码使用字符串程序或re.split(),它适用于我测试过的所有内容;但是,我知道可以打破它的极端情况我希望创建通用正则表达式以更好地处理逻辑,因此相同的正则表达式可以移植到其他语言(perl,awk,C等等)我们在工作中使用它来帮助我们保持一致。
我希望在Python中使用re.match()或re.split()。
我看的模式应该做到以下几点:
1)在第一个分裂str?如果?不在由单引号和/或双引号限定的子字符串中。
strIn:
'''
foo = 'some',"stuff?",'that "could be?" nested?', ? but still capture this? and "this?"
'''
listOut
['''foo = 'some',"stuff?",'that "could be?" nested?', ''' , ''' but still capture this? and "this?"''']
2)在第一个#if上拆分str如果#不在由单引号或双引号限定的子字符串中,并且#不在第一个不合格之后? (按照1)
strIn:
'''
foo = 'some',"stuff?#, maybe 'nested#' " # #but now this is all a comment to capture ,'that "could be?#" nested#', ? but still capture this?! and "this?! "
'''
listOut:
['''foo = 'some',"stuff?#, maybe 'nested#' " ''', ''' #but now this is all a comment to capture ,'that "could be?#" nested#', ? but still capture this?! and "this?! "'''
答案 0 :(得分:1)
您可以使用re.split
>>> s = '''foo = 'some',"stuff?",'that "could be?" nested?', ? but still capture this? and "this?"'''
>>> [i for i in re.split(r'^((?:"[^"]*"|\'[^\']*\'|[^\'"?])*)\?', s) if i]
['foo = \'some\',"stuff?",\'that "could be?" nested?\', ', ' but still capture this? and "this?"']
或强>
re.findall
。
>>> re.findall(r'^((?:"[^"]*"|\'[^\']*\'|[^\'"?])*)\?(.*)', s)
[('foo = \'some\',"stuff?",\'that "could be?" nested?\', ', ' but still capture this? and "this?"')]
>>> [j for i in re.findall(r'^((?:"[^"]*"|\'[^\']*\'|[^\'"?])*)\?(.*)', s) for j in i]
['foo = \'some\',"stuff?",\'that "could be?" nested?\', ', ' but still capture this? and "this?"']
对于第二个问题,你可以像上面那样做。
>>> s = '''foo = 'some',"stuff?#, maybe 'nested#' " # #but now this is all a comment to capture ,'that "could be?#" nested#', ? but still capture this?! and "this?! "'''
>>> [j for i in re.findall(r'^((?:"[^"]*"|\'[^\']*\'|[^\'"#])*)#(.*)', s) for j in i]
['foo = \'some\',"stuff?#, maybe \'nested#\' " ', ' #but now this is all a comment to capture ,\'that "could be?#" nested#\', ? but still capture this?! and "this?! "']