Python正则表达式匹配选项“\ *”(字面Asterisk字符)或“\ s”(空格)

时间:2014-02-04 04:59:59

标签: python regex

我试图匹配" D"线条和将类似数据集中的字符 2,3,4和5捕获到:

S    7....                        <- line 1
         associated random data   <- line 2
D*EX 0....                        <- line 3
         associated random data   <- line 4
C    0....                        <- line 5
         associated random data   <- line 6
D E  6....                        <- line 7
         associated random data   <- line 8
         associated random data   <- line 9
D    3....                        <- line 10
         associated random data   <- line 11
D O  3....                        <- line 12
         associated random data   <- line 13
         associated random data   <- line 14

即。我不想只是捕获^ D. *&#34; EX&#34;字符可以改变,我以后需要区分它们。

我遇到的问题似乎是&#34; *&#34;之间的选择。和一个&#34; &#34; (空格)在第二个字符(列)中。

但是在&#34; *&#34;之间指定选择;和&#34; \ s&#34;似乎在线上没有匹配&#34; D * EX 0 ....&#34;

re.compile(r'''^(^[D]               # Match "D"
                [\*|\s]         <-- # Match either "*" or " "
                [A-Z{1,2}\s|\s{3}]  # match either "EX" + "" OR match 3x" "
.*?)^[A-Z]''', re.DOTALL | re.MULTILINE |re.VERBOSE)  # match anything else if there...

匹配和输出=&gt; D EX 6....D 3....

如果我隐含地指定&#34; *&#34;,我确实最终得到一个行匹配,但当然其他行不匹配。

re.compile(r'''^(^[D]               # Match "D"
                [\*]            <-- # Match ONLY "*"
                [A-Z{1,2}\s|\s{3}]  # match either "EX" + "" OR match 3x" "
.*?)^[A-Z]''', re.DOTALL | re.MULTILINE |re.VERBOSE)  # match anything else if there...

仅限匹配和输出=&gt; D*EX 0....

有人建议我尝试使用非捕获组,虽然NC组是新的,但对我来说有点意义,我可能仍然希望捕获的输出和NC组之间的原始选择&# 34; *&#34;和&#34; \ s&#34;,仍然不匹配。我玩了很多组合,但输出与下面的一致。

re.compile(r'''^(^[D]               # Match "D"
                (?:[\*|\s]      <-- # non-capturing group match either "*" or " "
                [A-Z{1,2}\s|\s{3}]  # match either "EX" + "" OR match 3x" "
.*?)^[A-Z]''', re.DOTALL | re.MULTILINE |re.VERBOSE)  # match anything else if there...

匹配和输出=&gt; D EX 0....D 0....


1 个答案:

答案 0 :(得分:1)


import re

txt = '''S    7....                        <- line 1
         associated random data   <- line 2
D*EX 0....                        <- line 3
         associated random data   <- line 4
C    0....                        <- line 5
         associated random data   <- line 6
D E  6....                        <- line 7
         associated random data   <- line 8
         associated random data   <- line 9
D    3....                        <- line 10
         associated random data   <- line 11
D O  3....                        <- line 12
         associated random data   <- line 13
         associated random data   <- line 14'''

flags = re.DOTALL | re.MULTILINE |re.VERBOSE


re1 = re.compile('''^(D.*?)\d''', flags)    
print re.findall(re1, txt)


['D*EX ', 'D E  ', 'D    ', 'D O  ']



flags = re.DOTALL | re.VERBOSE


re1 = re.compile(
  r'''(?:^|\n) # noncapturing, assert start of string or newline
      (D.*?)   # capture D and everything after it
      (?=\n[A-Z]|$) #lookahead, newline cap char or end of string?
  ''', flags)

for i in  re.findall(re1, txt):
    print i


D*EX 0....                        <- line 3
         associated random data   <- line 4
D E  6....                        <- line 7
         associated random data   <- line 8
         associated random data   <- line 9
D    3....                        <- line 10
         associated random data   <- line 11
D O  3....                        <- line 12
         associated random data   <- line 13
         associated random data   <- line 14




