使用Python docs中的修改代码,在下面的标记器中,我添加了正则表达式,以匹配文档字符串,注释和引号,这些文档,注释和引号组合成单个主正则表达式,并且循环匹配。
RangeTarget.Value=RangeBChgCode.value&RangeItemCode.Value
上面给出了以下错误:
import collections
import re
Token = collections.namedtuple('Token', ['typ', 'value', 'line', 'column'])
def tokenize(code):
keywords = {'IF', 'THEN', 'ENDIF', 'FOR', 'NEXT', 'GOSUB', 'RETURN'}
token_specification = [
('DOC', r'([\'"]{3})[^\x00]*?\2'), # docstrings
('COMM', r'#.*'), # comments
('QUOT', r'(?:"([^"\\\\]*(?:\\.[^"\\\\]*)*)"|' # quotes
r'\'([^\'\\]*(?:\\.[^\'\\]*)*)\')|'
r'r\'([^"(]*)\((.*?)\)\4\'')
]
tok_regex = '|'.join('(?P<%s>%s)' % pair for pair in token_specification)
line_num = 1
line_start = 0
for mo in re.finditer(tok_regex, code):
kind = mo.lastgroup
value = mo.group(kind)
if kind == 'NEWLINE':
line_start = mo.end()
line_num += 1
elif kind == 'SKIP':
pass
elif kind == 'MISMATCH':
raise RuntimeError('%r unexpected on line %d' % (value, line_num))
else:
if kind == 'ID' and value in keywords:
kind = value
column = mo.start() - line_start
yield Token(kind, value, line_num, column)
statements = '''
"""docstr1
blah"""
\'''docstr2\'''
# ok
IF "okkk" and 'ole' quantity THEN
total := total + price * quantity;
tax := price * 0.05;
ENDIF;
'''
for token in tokenize(statements):
print(token)
我知道问题在于line 72, in <module>
for token in tokenize(statements),
line 44, in tokenize
for mo in re.finditer(tok_regex, code),
line 220, in finditer
return _compile(pattern, flags).finditer(string),
line 293, in _compile
p = sre_compile.compile(pattern, flags),
line 536, in compile
p = sre_parse.parse(p, flags),
line 829, in parse
p = _parse_sub(source, pattern, 0),
line 437, in _parse_sub
itemsappend(_parse(source, state)),
line 778, in _parse
p = _parse_sub(source, state),
line 437, in _parse_sub
itemsappend(_parse(source, state)),
line 524, in _parse
code = _escape(source, this, state),
line 415, in _escape
len(escape)) sre_constants.error: cannot refer to an open group at position 142
正则表达式应该在其他两个正则表达式之前匹配(如果这太简单或明显错误,请解释)。< / p>
我担心在某些情况下,我可能最终搞乱了正则表达式的顺序,并且在我提供适当的QUOT
之前无法修复错误。
我的问题是我怎样才能优雅地处理此类案例/错误?是否有适当的statements
方法?用来说明这一点的任何代码示例都很棒。
(使用Python 3.5.1 )