Question

好的，我有两种正则表达式。

([a-zA-Z0-9]?http[s]?:\/\/)?((?:(?:\w+)\.)(?:\S+)(?:\.(?:\w+))+?)
[a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,6}

第一个符合我在字符串中查找网址的需求。第二个满足我在字符串中查找电子邮件地址的需求。但是，出于某种原因，第一个问题是找到与此first.last@d1.d2.d3.d4或first.last@d1.com类似的电子邮件地址。我需要一些帮助才能获得第一个，以便它不会接收这些电子邮件地址。

Answer 1

例如，你可以通过排除@

来修复它

（[A-ZA-Z0-9] HTTP [S]：？\ / \ /）（（？：？（？：\ W +）\）（：<？强> [^ \ S @ ] +）（？：\（：？\ W +）？）*）

最后我建议使用*？而不是+？，+？如果没有 www

，则不匹配1级域名

但它找到了abc @ gmail.com

可悲的是，我不知道如何在匹配的子字符串之前检查第一个符号是不是@

修改：糟糕的解决方案的 ^ [^ @] * （[A-ZA-Z0-9] HTTP [S]：？\ / \ /）（（：？（？：\。瓦特+）\）（？：[^ \ S @] +）（？：\（：？\ W +））？*）检查从行的开头到匹配的部分

没有@s

Answer 2

([a-zA-Z0-9]?http[s]?:\/\/)?((?:(?:\w+)\.)(?:\S+)(?:\.(?:\w+))+?)

打破这种局面，有几个问题......

(             // capture protocol
[a-zA-Z0-9]?  // matches alphanumeric, optionally (do you really want that to start the string before the protoco?)
http[s]?      // square brackets delimit character class, so are unneccessary here, although don't change functionality
:\/\/         // matches ://
)?            // make captured protocol optional
((?:(?:\w+)\.)(?:\S+)(?:\.(?:\w+))+?) // too many lookaheads, not enough patterns. Innefficient and causing your error

我会用更像这样的东西替换正则表达式...

(https?:\/\/)?(\w[-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?

从Web地址正则表达式中排除电子邮件地址

2 个答案: