Question

嗨用python我想在文本中捕获电话号码，但想要排除传真或传真之后的字样。

我使用以下正则表达式，如果句子以传真或传真开头，但如果传真在句子内部则不起作用：

^(?!fax|Fax)(?:.*?)(?![-a-z])((?:[^0-9])((\+|00)33\s?|0|\(0\))[123456789][ \.\-]?[0-9]{2}[ \.\-]?[0-9]{2}[ \.\-]?[0-9]{2}[ \.\-]?[0-9]{2})(?![0-9])

这是我分析的文字示例：

text
Adresse quai du Sa fax 06 32 32 32 33 rtel – 59100 ROUBAIX| FRANCE
faTel : 0 8 99 70 1761 – Fax : 06 32 32 32 34
Mail :support@domain.com
06 32 32 32 35

Fax 06 32 32 32 36
tel 06 32 32 32 37 henrg

我的正则表达式的结果是：

Match 1
Full match  5-42    `Adresse quai du Sa fax 06 32 32 32 33`
Group 1.    27-42   ` 06 32 32 32 33`
Group 2.    28-29   `0`
Match 2
Full match  72-117  `faTel : 0 8 99 70 1761 – Fax : 06 32 32 32 34`
Group 1.    102-117 ` 06 32 32 32 34`
Group 2.    103-104 `0`
Match 3
Full match  118-157 `Mail :support@domain.com
06 32 32 32 35`
Group 1.    142-157 `
06 32 32 32 35`
Group 2.    143-144 `0`
Match 4
Full match  178-196 `tel 06 32 32 32 37`
Group 1.    181-196 ` 06 32 32 32 37`
Group 2.    182-183 `0`

但我不想要＆＃34; 06 32 32 32 34＆＃34;和＆＃34; 06 32 32 32 33＆＃34;在结果中因为＆＃34;传真＆＃34;在...之前...

由于

Answer 1

我建议使用符合您不需要的正则表达式，但会匹配并捕获您需要的内容：

`Comparator<Entity Class name> EBCDIC = new Comparator<Entity Class name>() 

     {  
        Charset encoding = Charset.forName("cp500");

   @Override         
  public int compare(Entity Class name jc1, 
       Entity Class name jc2) {             
          return (int) (encoding.encode(jc1.toString()).compareTo(encoding.encode(jc2.toString())));         
        }     
      };

请参阅regex demo。绿色突出显示的项目是您需要抓取的。注意：您将在第1组中获得的数字应至少包含2位数字。此外，您可以根据进一步的要求精确确定模式，只需使用相同的“框架”，因为我试图简化正则表达式结构以显示主要概念。

<强>详情

(?i)fax\W*\d[\s\d]*|(\d[\s\d]*\d) - 不区分大小写的修饰符
(?i) - fax substring
fax - 任何0 +非单词字符（您可以将其精确地仅用于空格和冒号，例如\W*）
\s*(?::\s*)? - 数字
\d - 0+空格或数字
[\s\d]* - 或......
| - 第1组（您需要的值）
- (\d[\s\d]*\d) - 数字
- \d - 0+空格或数字
- [\s\d]* - 数字

在Python 2中，使用

\d

请参阅Python 2 demo

Answer 2

你使用的是lookahead而不是lookbehind (?<!..)

有了这个正则表达式，我似乎得到了所有的电话号码而没有传真号码：

(?<!Fax |fax )((\d\d\s){5}|((\d\s){2}(\d\d\s){2}\d{4}))

具有负前瞻性的Python正则表达式

2 个答案: