Question

我现在正致力于一个可以转换括号模式的功能，例如＆＃39; [a-c]＆＃39;到了＆＃39;，＆＃39; b＆＃39;和＆＃39; c＆＃39;。

我不是要在Python中进行模式匹配。我的意思是我可以使用的东西＆＃39; [a-c]＆＃39;作为输入，输出相应的＆＃39; a＆＃39;＆＃39; b＆＃39;和＆＃39; c＆＃39;这是＆＃39; [a-c]＆＃39;的有效匹配字符。在python正则表达式中。我想要匹配的字符。

我们只需将[a-zA-Z0-9_-]视为括号中的有效字符。
不再有像＆＃39; *＆＃39;或＆＃39; +＆＃39;或＆＃39;？＆＃39;考虑。

然而，编写一个强大的版本非常困难，因为我们有很多情况需要考虑。所以，我想知道是否有一些工具可以在Python中执行此操作？

注意：这个有一些错误，如@swenzel所述。我已经编写了一个功能来完成这项工作。您可以在此Gist

中查看

我推荐@swenzel在他的第二个提案中的方式。 有关re.findall的详细信息，请查看doc

Answer 1

这听起来像是家庭作业......但也是如此根据我的理解，您需要一个解析器来进行范围定义你去了：

def parseRange(rangeStr, i=0):
    # Recursion anchor, return empty set if we're out of bounds
    if i >= len(rangeStr):
        return set()

    # charSet will tell us later if we actually have a range here
    charSet = None

    # There can only be a range if we have more than 2 characters left in the
    # string and if the next character is a dash
    if i+2 < len(rangeStr) and rangeStr[i+1] == '-':

        # We might have a range. Valid ranges are between the following pairs of
        # characters
        pairs = [('a', 'z'), ('A', 'Z'), ('0', '9')]

        for lo, hi in pairs:
            # We now make use of the fact that characters are comparable.
            # Also the second character should come after the first, or be
            # the same which means e.g. 'a-a' -> 'a'
            if (lo <= rangeStr[i] <= hi) and \
               (rangeStr[i] <= rangeStr[i+2] <= hi):
                   # Retreive the set with all chars from the substring
                   charSet = parseRange(rangeStr, i+3)

                   # Extend the chars from the substring with the ones in this
                   # range.
                   # `range` needs integers, so we transform the chars to ints
                   # using ord and make use of the fact that their ASCII
                   # representation is ascending
                   charSet.update(chr(k) for k in
                           range(ord(rangeStr[i]), 1+ord(rangeStr[i+2])))
                   break

    # If charSet is not yet defined this means that at the current position
    # there is not a valid range definition. So we just get all chars for the
    # following subset and add the current char
    if charSet is None:
        charSet = parseRange(rangeStr, i+1)
        charSet.add(rangeStr[i])

    # Return the char set with all characters defined within rangeStr[i:]
    return charSet

它可能不是最优雅的解析器，但它可以工作。你也必须在调用它时去除方括号，但你可以轻松地做到这一点，例如切片[1：-1]。

使用re中的解析器的另一个非常简短的转储和简单解决方案是：

def parseRangeRe(rangeStr):
    master_pattern = "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-"
    matcher = re.compile(rangeStr)
    return set(matcher.findall(master_pattern))

Answer 2

这是一个可能适合您的简单解决方案：

import re
import string

def expand(pattern):
    """
    Returns a list of characters that can be matched by the given pattern.
    """
    pattern = pattern[1:-1] # ignore the leading '[' and trailing ']'
    result = []
    lower_range_re = re.compile('[a-z]-[a-z]')
    upper_range_re = re.compile('[A-Z]-[A-Z]')
    digit_range_re = re.compile('[0-9]-[0-9]')

    for match in lower_range_re.findall(pattern):
        result.extend(string.ascii_lowercase[string.ascii_lowercase.index(match[0]):string.ascii_lowercase.index(match[2]) + 1])
    for match in upper_range_re.findall(pattern):
        result.extend(string.ascii_uppercase[string.ascii_uppercase.index(match[0]):string.ascii_uppercase.index(match[2]) + 1])
    for match in digit_range_re.findall(pattern):
        result.extend(string.digits[string.digits.index(match[0]):string.digits.index(match[2]) + 1])
    return result

它适用于[b-g]，[0-3]，[G-N]，[b-gG-N1-3]等模式。它不适用于像[abc]这样的模式，[0123]等。

Answer 3

此解决方案不需要正则表达式，因此它可能是错误的，但可以：

pattern = '[a-c]'
excludes = '[-]' # Or use includes if that is easier
result = []
for char in pattern:
    if char not in excludes: # if char in includes:
        result.append(char)
        print char

或者看看这里：range over character in python

如何在Python中使用模式获取相应的字符

3 个答案: