假设我们的地址簿中包含一些未格式化的数据,例如:
+1(4542)114214 111@111.org d@ghhg.com ,,,,
+1(2342)114234 ert@nhy.sdfr.domain.org; 1@kjk.eiu.1
+7(101)111-222-11 abc @ el.com,def @ sdf.org
+1(102)123532-2 some@mail.ru
+44(301)123 23 45 7zip@site.edu; ret@ghjj.org
我尝试为此编写正则表达式:
/ + \ d + \ s(\ d +)\ s \ d + [\ d + \ s | \ d + - ] + / g
但我不知道如何在字母字符前排除数字。可能这甚至不是部分解决方案。
编辑#1:我对所提供的所有工作解决方案感到不知所措,非常感谢大家。如果可能的话,如果您至少添加一些参考/解释如何编写这样复杂的正则表达式,我将不胜感激。
答案 0 :(得分:0)
这可能是少数情况之一,您需要possessive quantifier。
我的attempt:
\s*(\+?(\d+)\s*\(\d+\)\s+([- \d+]++(?!\@)|\d+))
如果跟随“@”,则[- \d+]++(?!\@)
部分将停止匹配。因此,它不包括电子邮件地址。
电话号码现在存储在第1组中。
修改强> 是的,最后一个输入行与correctley不匹配。使用以下正则表达式提取电子邮件地址可能更容易,因此保留了电话号码(还有一些逗号,但它们也应该是一个问题):
\s[^\@ ]+\@[-\w.]+\.\w+
答案 1 :(得分:0)
如果不知道你在哪里使用这个正则表达式,我建议使用否定前瞻。
^[+\d() -]+(?![\w@])
演示:https://regex101.com/r/rQ6fK4/1
如果您想捕获电话号码,请使用:
^([+\d() -]+)(?![\w@])
它将位于$1
或\1
,(取决于您使用此处的位置)。
答案 2 :(得分:0)
您可以使用demo:
(?<phone>\+\d{1,2}\s\(\d{3,4}\)\s(?:[\d- ]+\d)(?=\s))
\s+(?<email>.*?@.*?)(?=[\s;,]|$).*?
\s+(?<email2>[\w]*?@.*?)?(?=[\s;,]|$)
哪种产品:
MATCH 1
phone [4-20] `+1 (4542) 114214`
email [21-32] `111@111.org`
email2 [33-43] `d@ghhg.com`
MATCH 2
phone [52-68] `+1 (2342) 114234`
email [69-92] `ert@nhy.sdfr.domain.org`
email2 [94-105] `1@kjk.eiu.1`
MATCH 3
phone [110-129] `+7 (101) 111-222-11`
email [130-141] `abc@ert.com`
email2 [143-154] `def@sdf.org`
MATCH 4
phone [159-176] `+1 (102) 123532-2`
email [177-189] `some@mail.ru`
MATCH 5
phone [194-213] `+44 (301) 123 23 45`
email [214-227] `7zip@site.edu`
email2 [229-241] `ret@ghjj.org`
说明:
(?<phone>\+\d{1,2}\s\(\d{3,4}\)\s(?:[\d- ]+\d)(?=\s)) Named capturing group phone
\+ matches the character + literally
\d{1,2} match a digit [0-9]
Quantifier: {1,2} Between 1 and 2 times, as many times as possible, giving back as needed [greedy]
\s match any white space character [\r\n\t\f ]
\( matches the character ( literally
\d{3,4} match a digit [0-9]
Quantifier: {3,4} Between 3 and 4 times, as many times as possible, giving back as needed [greedy]
\) matches the character ) literally
\s match any white space character [\r\n\t\f ]
(?:[\d- ]+\d) Non-capturing group
[\d- ]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\d match a digit [0-9]
- a single character in the list - literally
\d match a digit [0-9]
(?=\s) Positive Lookahead - Assert that the regex below can be matched
\s match any white space character [\r\n\t\f ]
\s+ match any white space character [\r\n\t\f ]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
(?<email>.*?@.*?) Named capturing group email
.*? matches any character (except newline)
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
@ matches the character @ literally
.*? matches any character (except newline)
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
(?=[\s;,]|$) Positive Lookahead - Assert that the regex below can be matched
1st Alternative: [\s;,]
[\s;,] match a single character present in the list below
\s match any white space character [\r\n\t\f ]
;, a single character in the list ;, literally
2nd Alternative: $
$ assert position at end of a line
.*? matches any character (except newline)
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
\s+ match any white space character [\r\n\t\f ]
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
(?<email2>[\w]*?@.*?)? Named capturing group email2
Quantifier: ? Between zero and one time, as many times as possible, giving back as needed [greedy]
Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
[\w]*? match a single character present in the list below
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
\w match any word character [a-zA-Z0-9_]
@ matches the character @ literally
.*? matches any character (except newline)
Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
(?=[\s;,]|$) Positive Lookahead - Assert that the regex below can be matched
1st Alternative: [\s;,]
[\s;,] match a single character present in the list below
\s match any white space character [\r\n\t\f ]
;, a single character in the list ;, literally
2nd Alternative: $
$ assert position at end of a line
g modifier: global. All matches (don't return on first match)
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])
x modifier: extended. Spaces and text after a # in the pattern are ignored
答案 3 :(得分:0)
这是我的解决方案(在regex101上):
\+\d+\s+\(\d+\)\s+[- \d]+(?= )
确保最后一组空格,数字和/或短划线([- \d]+
)后面跟一个空格((?= )
)。
它干净地捕获了所有示例,没有尾随空格,也没有包含电子邮件地址的任何部分。
答案 4 :(得分:0)
我甚至不会尝试解析电话号码。
您有一个电话号码,用一个或多个电子邮件地址中的空格字符分隔,以逗号或分号分隔。电子邮件地址始终包含@。
找到第一个@。如果没有,则电话号码是修剪后的字符串。如果有@,则找到@之前的最后一个空格。电话号码是那个空间的一切,修剪。如果@之前没有空格,那么您没有电话号码。
删除电话号码后,您可以通过将字符串拆分为“,”或“;”,修剪字符串,丢弃不包含@的内容来查找电子邮件。
然后找一个合适的号码来处理电话号码,如果你需要这样做,除了记录电话号码。