Question

我编写了一个脚本，用于在解析器读取命令之前捕获并更正命令。解析器要求用逗号分隔相等，不等于，等等的条目，例如：

'test（a＆gt; = b）'错误 'test（a，＆gt; =，b）'是正确的

我写的脚本运行正常，但我很想知道是否有更有效的方法来做到这一点。

这是我的剧本：

# Correction routine
def corrector(exp):
    def rep(exp,a,b):
        foo = ''
        while(True):
            foo = exp.replace(a,b)
            if foo == exp:
                return exp
            exp = foo

    # Replace all instances with a unique identifier. Do it in a specific order
    # so for example we catch an instance of '>=' before we get to '='
    items = ['>=','<=','!=','==','>','<','=']
    for i in range(len(items)):
        exp = rep(exp,items[i],'###%s###'%i)

    # Re-add items with commas
    for i in range(len(items)):
        exp = exp.replace('###%s###'%i,',%s,'%items[i])

    # Remove accidental double commas we may have added
    return exp.replace(',,',',')


print corrector('wrong_syntax(b>=c) correct_syntax(b,>=,c)')
// RESULT: wrong_syntax(b,>=,c) correct_syntax(b,>=,c)

谢谢！

Answer 1

正如评论中所提到的，一种方法是使用正则表达式。当以下正则表达式未被逗号包围时，它们会与您的任何运算符匹配，并使用插入逗号的相同字符串替换它们：

inputstring = 'wrong_syntax(b>=c) correct_syntax(b,>=,c)'
regex = r"([^,])(>=|<=|!=|==|>|<|=)([^,])"
replace = r"\1,\2,\3"

result = re.sub(regex, replace, inputstring)

print(result)

简单的正则表达式相对容易，但它们很快就会变得复杂。查看文档以获取更多信息：

http://docs.python.org/2/library/re.html

Answer 2

这是一个正如你所问的那样的正则表达式：

import re
regex = re.compile(r'''

    (?<!,)                  # Negative lookbehind
    (!=|[><=]=?)
    (?!,)                   # Negative lookahead

''', re.VERBOSE)
print regex.sub(r',\1,', 'wrong_expression(b>=c) or right_expression(b,>=,c)')

输出

wrong_expression(b,>=,c) or right_expression(b,>=,c)

pythonic字符串语法校正器

2 个答案: