Question

我在regex操纵方面很新。我在使用模块re过滤地址时使用Python 3.3。

我想知道为什么以下regex：

m3 = re.search("[ ,]*[0-9]{1,3}\s{0,1}(/|-|bt.)\s{0,1}[0-9]{1,3} ",Row[3]);

匹配字符串：

rue de l'hotel des monnaies 49-51 1060Bxl
  av Charles Woeste309 bte2 -Bxl
  Rue d'Anethan 46 bte 6
  AvenueDefré269/ 6

但不匹配字符串（m3 is None）：

Avenue Guillaume de Greef，418 bte 343
  Joseph Cuylits，24 bte5 Rue Louis
  Ernotte 64 bte 3
  Rue Saint-Martin 51 bte 7

这对我来说真的很奇怪。欢迎所有解释。谢谢。

Answer 1

看起来像你的正则表达式末尾的尾随空格“”是无意的并且正在破坏事物："[ ,]*[0-9]{1,3}\s{0,1}(/|-|bt.)\s{0,1}[0-9]{1,3} "

re.search正在寻找的正则表达式意味着以下内容（推荐你use the re.VERBOSE/re.X flag to allow you to put comments inside a regex，因此它不会很快变为只读;-)。请注意，使用带有re.VERBOSE的多行字符串“”“表示我们甚至无法插入该”“字符（您必须使用[]或其他\ s）

import re

addr_pat = re.compile("""
    [ ,]*       # zero or more optional leading space or commas
    [0-9]{1,3}  # 1-3 consecutive digits
    \s{0,1}     # one optional whitespace (instead you could just write \s?)
    (/|-|bt.)   # either forward-slash, minus or "bt[any character]" e.g. "bte"
    \s{0,1}     # one optional whitespace
    [0-9]{1,3}  # 1-3 consecutive digits
                # we omitted the trailing " " whitespace you inadvertently had
""", re.VERBOSE)

m3 = addr_pat.search("Rue Saint-Martin 51 bte 7 ")

对尾随空格的要求是以下每个原因无法匹配的原因：

Avenue Guillaume de Greef,418 bte 343
Joseph Cuylits,24 bte5 Rue Louis
Ernotte 64 bte 3
Rue Saint-Martin 51 bte 7

为什么我的地址过滤正则表达式会破坏？

1 个答案: