如何在Python中使用模式获取相应的字符

时间:2015-05-21 10:42:58

标签: python

我现在正致力于一个可以转换括号模式的功能,例如' [a-c]'到了',' b'和' c'。

我不是要在Python中进行模式匹配。我的意思是我可以使用的东西' [a-c]'作为输入,输出相应的' a'' b'和' c'这是' [a-c]'的有效匹配字符。在python正则表达式中。我想要匹配的字符。

我们只需将[a-zA-Z0-9_-]视为括号中的有效字符。
不再有像' *'或' +'或'?'考虑。

然而,编写一个强大的版本非常困难,因为我们有很多情况需要考虑。所以,我想知道是否有一些工具可以在Python中执行此操作?

注意:这个有一些错误,如@swenzel所述。 我已经编写了一个功能来完成这项工作。您可以在此Gist

中查看

我推荐@swenzel在他的第二个提案中的方式。 有关re.findall的详细信息,请查看doc

3 个答案:

答案 0 :(得分:2)

这听起来像是家庭作业......但也是如此 根据我的理解,您需要一个解析器来进行范围定义 你去了:

def parseRange(rangeStr, i=0):
    # Recursion anchor, return empty set if we're out of bounds
    if i >= len(rangeStr):
        return set()

    # charSet will tell us later if we actually have a range here
    charSet = None

    # There can only be a range if we have more than 2 characters left in the
    # string and if the next character is a dash
    if i+2 < len(rangeStr) and rangeStr[i+1] == '-':

        # We might have a range. Valid ranges are between the following pairs of
        # characters
        pairs = [('a', 'z'), ('A', 'Z'), ('0', '9')]

        for lo, hi in pairs:
            # We now make use of the fact that characters are comparable.
            # Also the second character should come after the first, or be
            # the same which means e.g. 'a-a' -> 'a'
            if (lo <= rangeStr[i] <= hi) and \
               (rangeStr[i] <= rangeStr[i+2] <= hi):
                   # Retreive the set with all chars from the substring
                   charSet = parseRange(rangeStr, i+3)

                   # Extend the chars from the substring with the ones in this
                   # range.
                   # `range` needs integers, so we transform the chars to ints
                   # using ord and make use of the fact that their ASCII
                   # representation is ascending
                   charSet.update(chr(k) for k in
                           range(ord(rangeStr[i]), 1+ord(rangeStr[i+2])))
                   break

    # If charSet is not yet defined this means that at the current position
    # there is not a valid range definition. So we just get all chars for the
    # following subset and add the current char
    if charSet is None:
        charSet = parseRange(rangeStr, i+1)
        charSet.add(rangeStr[i])

    # Return the char set with all characters defined within rangeStr[i:]
    return charSet

它可能不是最优雅的解析器,但它可以工作。 你也必须在调用它时去除方括号,但你可以轻松地做到这一点,例如切片[1:-1]。

使用re中的解析器的另一个非常简短的转储和简单解决方案是:

def parseRangeRe(rangeStr):
    master_pattern = "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-"
    matcher = re.compile(rangeStr)
    return set(matcher.findall(master_pattern))

答案 1 :(得分:1)

这是一个可能适合您的简单解决方案:

import re
import string

def expand(pattern):
    """
    Returns a list of characters that can be matched by the given pattern.
    """
    pattern = pattern[1:-1] # ignore the leading '[' and trailing ']'
    result = []
    lower_range_re = re.compile('[a-z]-[a-z]')
    upper_range_re = re.compile('[A-Z]-[A-Z]')
    digit_range_re = re.compile('[0-9]-[0-9]')

    for match in lower_range_re.findall(pattern):
        result.extend(string.ascii_lowercase[string.ascii_lowercase.index(match[0]):string.ascii_lowercase.index(match[2]) + 1])
    for match in upper_range_re.findall(pattern):
        result.extend(string.ascii_uppercase[string.ascii_uppercase.index(match[0]):string.ascii_uppercase.index(match[2]) + 1])
    for match in digit_range_re.findall(pattern):
        result.extend(string.digits[string.digits.index(match[0]):string.digits.index(match[2]) + 1])
    return result

它适用于[b-g][0-3][G-N][b-gG-N1-3]等模式。它不适用于像[abc]这样的模式,[0123]等。

答案 2 :(得分:0)

此解决方案不需要正则表达式,因此它可能是错误的,但可以:

pattern = '[a-c]'
excludes = '[-]' # Or use includes if that is easier
result = []
for char in pattern:
    if char not in excludes: # if char in includes:
        result.append(char)
        print char

或者看看这里:range over character in python