Question

我开始将我的python框架重做为python3兼容。我遇到的一个问题是：为我的正则表达式匹配输入错误。事实证明，我的正则表达式的部分需要编译为二进制文件，以避免在匹配其他函数产生的字节时出现类型错误。

所以我想写下这样的东西

@classmethod
def contains(cls, pattern, value):
    """
    :param pattern: A regular expression pattern. If input is plain string, will be compiled on the fly
    :param value: A string that might contain the given pattern (can be multi line string)
    :returns: True if pattern is found in value
    """
    compiled_pattern = pattern
    if type(pattern) is str:
        if type(value) is bytes:
            print("binary pattern")
            compiled_pattern = re.compile(b'{}'.format(pattern))
        else:
            print("normal pattern")
            compiled_pattern = re.compile(pattern)        

    if compiled_pattern.search(value) is None:
        return False
    return True

创建“普通”模式效果很好，但对于“二进制”模式，我得到了

compiled_pattern = re.compile(b'{}'.format(pattern))
AttributeError: 'bytes' object has no attribute 'format'

（此错误适用于python3; python2直接向我抛出语法错误）

那么，我如何指示python从变量编译正则表达式，但是作为二进制文件？

（我知道还有其他方法可以解决潜在的问题;例如通过在该方法中执行value = str（value））

Answer 1

这里的关键是b'{}'在python2和python3中给出了不同的结果：

python2.7：

type(b'{}') # <type 'str'>

python3：

type(b'{}') # <class 'bytes'>

实际发生了什么，是这样做的：

(b'{}').format(pattern)

所以它适用于2.7，因为格式是str的方法。

您需要使用bytes(pattern, encoding)

Answer 2

re.compile(bytes(pattern, 'utf-8'))

确保使用与＆＃34;其他功能相同的编码＆＃34;。

您可以使用上述方法转换为字节，但我建议您将其他函数提供的值转换为unicode。

Answer 3

只是为了完整性：另一个解决方案是不改变正则表达式的类型 - 而是改变传入数据的类型，例如：

if value is bytes:
    value = str(value)

如何“动态”编译正则表达式作为（非）二进制？

3 个答案: