我现在正致力于一个可以转换括号模式的功能,例如' [a-c]'到了',' b'和' c'。
我不是要在Python中进行模式匹配。我的意思是我可以使用的东西' [a-c]'作为输入,输出相应的' a'' b'和' c'这是' [a-c]'的有效匹配字符。在python正则表达式中。我想要匹配的字符。
我们只需将[a-zA-Z0-9_-]视为括号中的有效字符。
不再有像' *'或' +'或'?'考虑。
然而,编写一个强大的版本非常困难,因为我们有很多情况需要考虑。所以,我想知道是否有一些工具可以在Python中执行此操作?
注意:这个有一些错误,如@swenzel所述。 我已经编写了一个功能来完成这项工作。您可以在此Gist
中查看 我推荐@swenzel在他的第二个提案中的方式。
有关re.findall
的详细信息,请查看doc
答案 0 :(得分:2)
这听起来像是家庭作业......但也是如此 根据我的理解,您需要一个解析器来进行范围定义 你去了:
def parseRange(rangeStr, i=0):
# Recursion anchor, return empty set if we're out of bounds
if i >= len(rangeStr):
return set()
# charSet will tell us later if we actually have a range here
charSet = None
# There can only be a range if we have more than 2 characters left in the
# string and if the next character is a dash
if i+2 < len(rangeStr) and rangeStr[i+1] == '-':
# We might have a range. Valid ranges are between the following pairs of
# characters
pairs = [('a', 'z'), ('A', 'Z'), ('0', '9')]
for lo, hi in pairs:
# We now make use of the fact that characters are comparable.
# Also the second character should come after the first, or be
# the same which means e.g. 'a-a' -> 'a'
if (lo <= rangeStr[i] <= hi) and \
(rangeStr[i] <= rangeStr[i+2] <= hi):
# Retreive the set with all chars from the substring
charSet = parseRange(rangeStr, i+3)
# Extend the chars from the substring with the ones in this
# range.
# `range` needs integers, so we transform the chars to ints
# using ord and make use of the fact that their ASCII
# representation is ascending
charSet.update(chr(k) for k in
range(ord(rangeStr[i]), 1+ord(rangeStr[i+2])))
break
# If charSet is not yet defined this means that at the current position
# there is not a valid range definition. So we just get all chars for the
# following subset and add the current char
if charSet is None:
charSet = parseRange(rangeStr, i+1)
charSet.add(rangeStr[i])
# Return the char set with all characters defined within rangeStr[i:]
return charSet
它可能不是最优雅的解析器,但它可以工作。 你也必须在调用它时去除方括号,但你可以轻松地做到这一点,例如切片[1:-1]。
使用re
中的解析器的另一个非常简短的转储和简单解决方案是:
def parseRangeRe(rangeStr):
master_pattern = "1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_-"
matcher = re.compile(rangeStr)
return set(matcher.findall(master_pattern))
答案 1 :(得分:1)
这是一个可能适合您的简单解决方案:
import re
import string
def expand(pattern):
"""
Returns a list of characters that can be matched by the given pattern.
"""
pattern = pattern[1:-1] # ignore the leading '[' and trailing ']'
result = []
lower_range_re = re.compile('[a-z]-[a-z]')
upper_range_re = re.compile('[A-Z]-[A-Z]')
digit_range_re = re.compile('[0-9]-[0-9]')
for match in lower_range_re.findall(pattern):
result.extend(string.ascii_lowercase[string.ascii_lowercase.index(match[0]):string.ascii_lowercase.index(match[2]) + 1])
for match in upper_range_re.findall(pattern):
result.extend(string.ascii_uppercase[string.ascii_uppercase.index(match[0]):string.ascii_uppercase.index(match[2]) + 1])
for match in digit_range_re.findall(pattern):
result.extend(string.digits[string.digits.index(match[0]):string.digits.index(match[2]) + 1])
return result
它适用于[b-g]
,[0-3]
,[G-N]
,[b-gG-N1-3]
等模式。它不适用于像[abc]
这样的模式,[0123]
等。
答案 2 :(得分:0)
此解决方案不需要正则表达式,因此它可能是错误的,但可以:
pattern = '[a-c]'
excludes = '[-]' # Or use includes if that is easier
result = []
for char in pattern:
if char not in excludes: # if char in includes:
result.append(char)
print char