Question

这是一个涉及python中的条件正则表达式的问题：

我想将字符串"abc"与

匹配

match(1)="a"
match(2)="b"
match(3)="c"

但也将字符串" a"与

匹配

match(1)="a"
match(2)=""
match(3)=""

以下代码ALMOST执行此操作，问题是在第一种情况下match(1)="a" 但在第二种情况下，match(4)="a"（根据需要不是match(1)）。

事实上，如果您使用for g in re.search(myre,teststring2).groups():遍历所有组，则会获得6个组（不是预期的3个组）。

import re
import sys

teststring1 = "abc"
teststring2 = "  a"

myre = '^(?=(\w)(\w)(\w))|(?=\s{2}(\w)()())'

if re.search(myre,teststring1):
    print re.search(myre,teststring1).group(1)

if re.search(myre,teststring2):
   print re.search(myre,teststring2).group(1)

有什么想法？（注意这是Python 2.5）

Answer 1

也许......：

import re
import sys

teststring1 = "abc"
teststring2 = "  a"

myre = '^\s{0,2}(\w)(\w?)(\w?)$'

if re.search(myre,teststring1):
    print re.search(myre,teststring1).group(1)

if re.search(myre,teststring2):
   print re.search(myre,teststring2).group(1)

这确实在两种情况下都可以提供a，但也许它与您未显示的其他情况下的方式不匹配（例如前面没有空格，或者空格和之后不止一个字母，以便匹配字符串的总长度为!= 3 ...但我只是猜测你不想要匹配箱子...？）

Answer 2

表达式中的每个捕获组都获得它自己的索引。试试这个：

r = re.compile("^\s*(\w)(\w)?(\w)?$")

abc -> ('a', 'b', 'c')
a -> ('a', None, None)

要打破它：

^     // anchored at the beginning
\s*   // Any number of spaces to start with
(\w)  // capture the first letter, which is required
(\w)? // capture the second letter, which is optional
(\w)? // capture the third letter, which is optional
$     // anchored at the end

Answer 3

myre = '^(?=\s{0,2}(\w)(?:(\w)(\w))?)'

这将以您想要的方式处理您描述的两种情况，但不一定是一般解决方案。感觉就像你提出了一个代表真实玩具的玩具问题。

很难得到一般解决方案，因为后面元素的处理取决于先前元素的处理和/或相反的处理。例如，如果您有完整的abc，则初始空格不应该在那里。如果有初始空格，您应该只找到a。

在我看来，处理此问题的最佳方法是使用最初的|构造。您可以在匹配后获得一些代码，将组拉出到一个数组中，并根据您的喜好安排它们。

群组规则是所有左括号后面的?:不会立即成为群组。该组可能是空的，因为它实际上并不匹配任何东西，但它会在那里。

Python条件正则表达式

3 个答案: