如何在Python中处理“无法引用位置上的开放组”?

时间:2016-11-18 06:15:41

标签: regex python-3.x tokenize

使用Python docs中的修改代码,在下面的标记器中,我添加了正则表达式,以匹配文档字符串,注释和引号,这些文档,注释和引号组合成单个主正则表达式,并且循环匹配。

RangeTarget.Value=RangeBChgCode.value&RangeItemCode.Value

上面给出了以下错误:

import collections
import re

Token = collections.namedtuple('Token', ['typ', 'value', 'line', 'column'])

def tokenize(code):
    keywords = {'IF', 'THEN', 'ENDIF', 'FOR', 'NEXT', 'GOSUB', 'RETURN'}
    token_specification = [

        ('DOC',   r'([\'"]{3})[^\x00]*?\2'),              # docstrings
        ('COMM',  r'#.*'),                                # comments
        ('QUOT',  r'(?:"([^"\\\\]*(?:\\.[^"\\\\]*)*)"|'   # quotes
                  r'\'([^\'\\]*(?:\\.[^\'\\]*)*)\')|'
                  r'r\'([^"(]*)\((.*?)\)\4\'')
    ]
    tok_regex = '|'.join('(?P<%s>%s)' % pair for pair in token_specification)
    line_num = 1
    line_start = 0
    for mo in re.finditer(tok_regex, code):
        kind = mo.lastgroup
        value = mo.group(kind)
        if kind == 'NEWLINE':
            line_start = mo.end()
            line_num += 1
        elif kind == 'SKIP':
            pass
        elif kind == 'MISMATCH':
            raise RuntimeError('%r unexpected on line %d' % (value, line_num))
        else:
            if kind == 'ID' and value in keywords:
                kind = value
            column = mo.start() - line_start
            yield Token(kind, value, line_num, column)

statements = '''
    """docstr1

    blah"""
    \'''docstr2\'''
    # ok
    IF "okkk" and 'ole' quantity THEN
        total := total + price * quantity;
        tax := price * 0.05;
    ENDIF;
'''

for token in tokenize(statements):
    print(token)

我知道问题在于line 72, in <module> for token in tokenize(statements), line 44, in tokenize for mo in re.finditer(tok_regex, code), line 220, in finditer return _compile(pattern, flags).finditer(string), line 293, in _compile p = sre_compile.compile(pattern, flags), line 536, in compile p = sre_parse.parse(p, flags), line 829, in parse p = _parse_sub(source, pattern, 0), line 437, in _parse_sub itemsappend(_parse(source, state)), line 778, in _parse p = _parse_sub(source, state), line 437, in _parse_sub itemsappend(_parse(source, state)), line 524, in _parse code = _escape(source, this, state), line 415, in _escape len(escape)) sre_constants.error: cannot refer to an open group at position 142 正则表达式应该在其他两个正则表达式之前匹配(如果这太简单或明显错误,请解释)。< / p>

我担心在某些情况下,我可能最终搞乱了正则表达式的顺序,并且在我提供适当的QUOT之前无法修复错误。

我的问题是我怎样才能优雅地处理此类案例/错误?是否有适当的statements方法?用来说明这一点的任何代码示例都很棒。

使用Python 3.5.1

0 个答案:

没有答案