Question

假设我们的地址簿中包含一些未格式化的数据，例如：

+1（4542）114214 111@111.org d@ghhg.com ,,,,

+1（2342）114234 ert@nhy.sdfr.domain.org; 1@kjk.eiu.1

+7（101）111-222-11 abc @ el.com，def @ sdf.org

+1（102）123532-2 some@mail.ru

+44（301）123 23 45 7zip@site.edu; ret@ghjj.org

我尝试为此编写正则表达式：

/ + \ d + \ s（\ d +）\ s \ d + [\ d + \ s | \ d + - ] + / g

但我不知道如何在字母字符前排除数字。可能这甚至不是部分解决方案。

编辑＃1：我对所提供的所有工作解决方案感到不知所措，非常感谢大家。如果可能的话，如果您至少添加一些参考/解释如何编写这样复杂的正则表达式，我将不胜感激。

Answer 1

这可能是少数情况之一，您需要possessive quantifier。

我的attempt：

\s*(\+?(\d+)\s*\(\d+\)\s+([- \d+]++(?!\@)|\d+))

如果跟随“@”，则[- \d+]++(?!\@)部分将停止匹配。因此，它不包括电子邮件地址。

电话号码现在存储在第1组中。

修改是的，最后一个输入行与correctley不匹配。使用以下正则表达式提取电子邮件地址可能更容易，因此保留了电话号码（还有一些逗号，但它们也应该是一个问题）：

\s[^\@ ]+\@[-\w.]+\.\w+

Answer 2

如果不知道你在哪里使用这个正则表达式，我建议使用否定前瞻。

^[+\d() -]+(?![\w@])

演示：https://regex101.com/r/rQ6fK4/1

如果您想捕获电话号码，请使用：

^([+\d() -]+)(?![\w@])

它将位于$1或\1，（取决于您使用此处的位置）。

Answer 3

您可以使用demo：

(?<phone>\+\d{1,2}\s\(\d{3,4}\)\s(?:[\d- ]+\d)(?=\s)) 
\s+(?<email>.*?@.*?)(?=[\s;,]|$).*?
\s+(?<email2>[\w]*?@.*?)?(?=[\s;,]|$)

哪种产品：

MATCH 1
phone   [4-20]  `+1 (4542) 114214`
email   [21-32] `111@111.org`
email2  [33-43] `d@ghhg.com`
MATCH 2
phone   [52-68] `+1 (2342) 114234`
email   [69-92] `ert@nhy.sdfr.domain.org`
email2  [94-105]    `1@kjk.eiu.1`
MATCH 3
phone   [110-129]   `+7 (101) 111-222-11`
email   [130-141]   `abc@ert.com`
email2  [143-154]   `def@sdf.org`
MATCH 4
phone   [159-176]   `+1 (102) 123532-2`
email   [177-189]   `some@mail.ru`
MATCH 5
phone   [194-213]   `+44 (301) 123 23 45`
email   [214-227]   `7zip@site.edu`
email2  [229-241]   `ret@ghjj.org`

说明：

(?<phone>\+\d{1,2}\s\(\d{3,4}\)\s(?:[\d- ]+\d)(?=\s)) Named capturing group phone

    \+ matches the character + literally
    \d{1,2} match a digit [0-9]
        Quantifier: {1,2} Between 1 and 2 times, as many times as possible, giving back as needed [greedy]
    \s match any white space character [\r\n\t\f ]
    \( matches the character ( literally
    \d{3,4} match a digit [0-9]
        Quantifier: {3,4} Between 3 and 4 times, as many times as possible, giving back as needed [greedy]
    \) matches the character ) literally
    \s match any white space character [\r\n\t\f ]
    (?:[\d- ]+\d) Non-capturing group
        [\d- ]+ match a single character present in the list below
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            \d match a digit [0-9]
            - a single character in the list - literally
        \d match a digit [0-9]
    (?=\s) Positive Lookahead - Assert that the regex below can be matched
        \s match any white space character [\r\n\t\f ]

\s+ match any white space character [\r\n\t\f ]

    Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]

(?<email>.*?@.*?) Named capturing group email

    .*? matches any character (except newline)
        Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
    @ matches the character @ literally
    .*? matches any character (except newline)
        Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]

(?=[\s;,]|$) Positive Lookahead - Assert that the regex below can be matched

    1st Alternative: [\s;,]
        [\s;,] match a single character present in the list below
            \s match any white space character [\r\n\t\f ]
            ;, a single character in the list ;, literally
    2nd Alternative: $
        $ assert position at end of a line

.*? matches any character (except newline)

    Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]

\s+ match any white space character [\r\n\t\f ]

    Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]

(?<email2>[\w]*?@.*?)? Named capturing group email2

    Quantifier: ? Between zero and one time, as many times as possible, giving back as needed [greedy]
    Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
    [\w]*? match a single character present in the list below
        Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
        \w match any word character [a-zA-Z0-9_]
    @ matches the character @ literally
    .*? matches any character (except newline)
        Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]

(?=[\s;,]|$) Positive Lookahead - Assert that the regex below can be matched

    1st Alternative: [\s;,]
        [\s;,] match a single character present in the list below
            \s match any white space character [\r\n\t\f ]
            ;, a single character in the list ;, literally
    2nd Alternative: $
        $ assert position at end of a line

g modifier: global. All matches (don't return on first match)
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
x modifier: extended. Spaces and text after a # in the pattern are ignored

Answer 4

这是我的解决方案（在regex101上）：

\+\d+\s+$\d+$\s+[- \d]+(?= )

确保最后一组空格，数字和/或短划线（[- \d]+）后面跟一个空格（(?= )）。

它干净地捕获了所有示例，没有尾随空格，也没有包含电子邮件地址的任何部分。

Answer 5

我甚至不会尝试解析电话号码。

您有一个电话号码，用一个或多个电子邮件地址中的空格字符分隔，以逗号或分号分隔。电子邮件地址始终包含@。

找到第一个@。如果没有，则电话号码是修剪后的字符串。如果有@，则找到@之前的最后一个空格。电话号码是那个空间的一切，修剪。如果@之前没有空格，那么您没有电话号码。

删除电话号码后，您可以通过将字符串拆分为“，”或“;”，修剪字符串，丢弃不包含@的内容来查找电子邮件。

然后找一个合适的号码来处理电话号码，如果你需要这样做，除了记录电话号码。

从复杂的字符串中提取电话号码

5 个答案: