我想使用正则表达式通过SpamAssassin检测URL。
我发现以下在我使用的各种方法中效果很好:
http(s)?://([a-zA-Z0-9.])+.[a-zA-Z]{2,3}
但是,这在SpamAssassin中不起作用。
如果我尝试使用上述正则表达式的任何相似性,我会收到以下错误:
[root@~]spamassassin --lint
Aug 13 19:30:25.005 [38721] warn: Having no space between pattern and following word is deprecated at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 14.
Aug 13 19:30:25.005 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 14, near "var"
Aug 13 19:30:25.005 [38721] warn: (Missing operator before var?)
Aug 13 19:30:25.005 [38721] warn: Misplaced _ in number at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 14.
Aug 13 19:30:25.005 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 14, near "72_active"
Aug 13 19:30:25.005 [38721] warn: (Missing operator before active?)
Aug 13 19:30:25.005 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 16, near "", ruletype => "body"
Aug 13 19:30:25.005 [38721] warn: (Missing operator before body?)
Aug 13 19:30:25.005 [38721] warn: String found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 28, near "body");
Aug 13 19:30:25.005 [38721] warn: last;
Aug 13 19:30:25.005 [38721] warn: }
Aug 13 19:30:25.005 [38721] warn: }
Aug 13 19:30:25.005 [38721] warn:
Aug 13 19:30:25.005 [38721] warn:
Aug 13 19:30:25.005 [38721] warn: }
Aug 13 19:30:25.005 [38721] warn:
Aug 13 19:30:25.005 [38721] warn: if ($scoresptr->{q{FUZZY_ERECT}}) {
Aug 13 19:30:25.005 [38721] warn:
Aug 13 19:30:25.005 [38721] warn: foreach my $l (@_) {
Aug 13 19:30:25.005 [38721] warn:
Aug 13 19:30:25.005 [38721] warn: #line 1 ""
Aug 13 19:30:25.005 [38721] warn: (Might be a runaway multi-line "" string starting on line 16)
Aug 13 19:30:25.006 [38721] warn: Having no space between pattern and following word is deprecated at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 28.
Aug 13 19:30:25.006 [38721] warn: Misplaced _ in number at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 28.
Aug 13 19:30:25.006 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 28, near "25_replace"
Aug 13 19:30:25.006 [38721] warn: (Missing operator before replace?)
Aug 13 19:30:25.006 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 30, near "", ruletype => "body"
Aug 13 19:30:25.006 [38721] warn: (Missing operator before body?)
Aug 13 19:30:25.006 [38721] warn: String found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 42, near "body");
Aug 13 19:30:25.006 [38721] warn: last;
Aug 13 19:30:25.006 [38721] warn: }
Aug 13 19:30:25.006 [38721] warn: }
Aug 13 19:30:25.006 [38721] warn:
Aug 13 19:30:25.006 [38721] warn:
Aug 13 19:30:25.006 [38721] warn: }
Aug 13 19:30:25.006 [38721] warn:
Aug 13 19:30:25.006 [38721] warn: if ($scoresptr->{q{MORE_SEX}}) {
Aug 13 19:30:25.006 [38721] warn:
Aug 13 19:30:25.006 [38721] warn: foreach my $l (@_) {
Aug 13 19:30:25.006 [38721] warn:
Aug 13 19:30:25.006 [38721] warn: #line 1 ""
Aug 13 19:30:25.006 [38721] warn: (Might be a runaway multi-line "" string starting on line 30)
Aug 13 19:30:25.006 [38721] warn: Having no space between pattern and following word is deprecated at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 42.
Aug 13 19:30:25.006 [38721] warn: Misplaced _ in number at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 42.
Aug 13 19:30:25.006 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 42, near "20_phrases"
Aug 13 19:30:25.006 [38721] warn: (Missing operator before phrases?)
Aug 13 19:30:25.007 [38721] warn: Bareword found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 44, near "", ruletype => "body"
Aug 13 19:30:25.007 [38721] warn: (Missing operator before body?)
Aug 13 19:30:25.007 [38721] warn: String found where operator expected at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 44, at end of line
Aug 13 19:30:25.007 [38721] warn: (Missing semicolon on previous line?)
Aug 13 19:30:25.007 [38721] warn: rules: failed to compile Mail::SpamAssassin::Plugin::Check::_body_tests_0_3, skipping:
Aug 13 19:30:25.007 [38721] warn: (Can't find string terminator '"' anywhere before EOF at /etc/mail/spamassassin/local.cf, rule HAS_LINK, line 44.)
Aug 13 19:30:25.140 [38721] warn: lint: 1 issues detected, please rerun with debug enabled for more information
答案 0 :(得分:0)
此正则表达式(https?:\/\/([a-zA-Z0-9_\-]+\.)+(mobi|[a-z]{2,3}))
检测常见网址。
它没有检测到generic TLDs的网址。如果您还需要检测这些内容,我会将它们添加到mobi
- 列表中。
对于你的正则表达式:如果你想在字面上检测到一个点,以及一些在正则表达式中具有特殊含义的字符,例如*
,/
,{{},则必须对其进行转义。 1}}等等。
https://regex101.com是正则表达式的一个很好的参考和测试网站,并为您提供了有用的解释。