使用正则表达式检测URL

时间:2015-08-14 00:34:33

标签: regex spamassassin

我想使用正则表达式通过SpamAssassin检测URL。

我发现以下在我使用的各种方法中效果很好:

http(s)?://([a-zA-Z0-9.])+.[a-zA-Z]{2,3}

但是,这在SpamAssassin中不起作用。

如果我尝试使用上述正则表达式的任何相似性,我会收到以下错误:

[root@~]spamassassin --lint
Aug 13 19:30:25.005 [38721] warn: Having no space between pattern and following word is deprecated at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 14.
Aug 13 19:30:25.005 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 14, near "var"
Aug 13 19:30:25.005 [38721] warn:  (Missing operator before var?)
Aug 13 19:30:25.005 [38721] warn: Misplaced _ in number at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 14.
Aug 13 19:30:25.005 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 14, near "72_active"
Aug 13 19:30:25.005 [38721] warn:  (Missing operator before active?)
Aug 13 19:30:25.005 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 16, near "", ruletype => "body"
Aug 13 19:30:25.005 [38721] warn:  (Missing operator before body?)
Aug 13 19:30:25.005 [38721] warn: String found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 28, near "body"); 
Aug 13 19:30:25.005 [38721] warn:  last;
Aug 13 19:30:25.005 [38721] warn:  }
Aug 13 19:30:25.005 [38721] warn:  }
Aug 13 19:30:25.005 [38721] warn:  
Aug 13 19:30:25.005 [38721] warn:  
Aug 13 19:30:25.005 [38721] warn:  }
Aug 13 19:30:25.005 [38721] warn:  
Aug 13 19:30:25.005 [38721] warn:  if ($scoresptr->{q{FUZZY_ERECT}}) {
Aug 13 19:30:25.005 [38721] warn:  
Aug 13 19:30:25.005 [38721] warn:  foreach my $l (@_) {
Aug 13 19:30:25.005 [38721] warn:  
Aug 13 19:30:25.005 [38721] warn: #line 1 ""
Aug 13 19:30:25.005 [38721] warn:  (Might be a runaway multi-line "" string starting on line 16)
Aug 13 19:30:25.006 [38721] warn: Having no space between pattern and following word is deprecated at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 28.
Aug 13 19:30:25.006 [38721] warn: Misplaced _ in number at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 28.
Aug 13 19:30:25.006 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 28, near "25_replace"
Aug 13 19:30:25.006 [38721] warn:  (Missing operator before replace?)
Aug 13 19:30:25.006 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 30, near "", ruletype => "body"
Aug 13 19:30:25.006 [38721] warn:  (Missing operator before body?)
Aug 13 19:30:25.006 [38721] warn: String found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 42, near "body"); 
Aug 13 19:30:25.006 [38721] warn:  last;
Aug 13 19:30:25.006 [38721] warn:  }
Aug 13 19:30:25.006 [38721] warn:  }
Aug 13 19:30:25.006 [38721] warn:  
Aug 13 19:30:25.006 [38721] warn:  
Aug 13 19:30:25.006 [38721] warn:  }
Aug 13 19:30:25.006 [38721] warn:  
Aug 13 19:30:25.006 [38721] warn:  if ($scoresptr->{q{MORE_SEX}}) {
Aug 13 19:30:25.006 [38721] warn:  
Aug 13 19:30:25.006 [38721] warn:  foreach my $l (@_) {
Aug 13 19:30:25.006 [38721] warn:  
Aug 13 19:30:25.006 [38721] warn: #line 1 ""
Aug 13 19:30:25.006 [38721] warn:  (Might be a runaway multi-line "" string starting on line 30)
Aug 13 19:30:25.006 [38721] warn: Having no space between pattern and following word is deprecated at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 42.
Aug 13 19:30:25.006 [38721] warn: Misplaced _ in number at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 42.
Aug 13 19:30:25.006 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 42, near "20_phrases"
Aug 13 19:30:25.006 [38721] warn:  (Missing operator before phrases?)
Aug 13 19:30:25.007 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 44, near "", ruletype => "body"
Aug 13 19:30:25.007 [38721] warn:  (Missing operator before body?)
Aug 13 19:30:25.007 [38721] warn: String found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 44, at end of line
Aug 13 19:30:25.007 [38721] warn:  (Missing semicolon on previous line?)
Aug 13 19:30:25.007 [38721] warn: rules: failed to compile Mail::SpamAssassin::Plugin::Check::_body_tests_0_3, skipping:
Aug 13 19:30:25.007 [38721] warn:  (Can't find string terminator '"' anywhere before EOF at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 44.)
Aug 13 19:30:25.140 [38721] warn: lint: 1 issues detected, please rerun with debug enabled for more information

1 个答案:

答案 0 :(得分:0)

此正则表达式(https?:\/\/([a-zA-Z0-9_\-]+\.)+(mobi|[a-z]{2,3}))检测常见网址。

它没有检测到generic TLDs的网址。如果您还需要检测这些内容,我会将它们添加到mobi - 列表中。

对于你的正则表达式:如果你想在字面上检测到一个点,以及一些在正则表达式中具有特殊含义的字符,例如*/,{{},则必须对其进行转义。 1}}等等。

https://regex101.com是正则表达式的一个很好的参考和测试网站,并为您提供了有用的解释。