警告的

Question

如何编写Regular Rxpression来搜索包含“http：//”并且不包含“mysite.com”的字符串？

Answer 1

警告的

尝试将正则表达式绑定到布尔逻辑中，最好用适当的编程语言完成，这是一项吃力不讨好的工作。虽然可以使用复杂的前瞻来编写/PAT1/ and not /PAT2/，因此它只是一种模式，但这是一项痛苦的任务。 你不要这样做！

你应该首先解释一下你在做什么 - 在文本编辑器中进行某种匹配操作。你没有。因此，您将获得一个适应您的本地化情况的一般性答案。

快速回答

(?sx) # let dot cross newlines, enable comments & whitspace (?= .* http:// ) # lookahead assertion for http:// (?! .* mysite\.com ) # lookahead negation for mysite.com

使用Perl语法，您可以将该（预）编译模式粘贴到变量中以供将来使用：

my $is_valid_rx = qr{ (?= .* http:// ) # lookahead assertion for http:// (?! .* mysite\.com ) # lookahead negation for mysite.com }sx; # /s to cross newlines, /x for comments & whitespace # then later on… if ($some_string =~ $is_valid_rx) { # your string has an http blah and lacks a mysite blah }

但是，如果您的目标是取出所有此类链接，那对您没有帮助，因为这些预测不会告诉您链接在字符串中的位置。

在这种情况下，使用两个单独的正则数据而不是尝试完成所有操作后，编写一些内容来提取链接然后过滤掉不需要的案例要容易得多。

@all_links = ($some_string =~ m{ https?://\S+ }xg); @good_links = grep !/mysite\.com/, @all_links;

请注意，不会尝试仅匹配包含有效URL字符的链接，也不会尝试在纯文本中经常出现意外尾随标点符号。

现在，获得真正的答案

另请注意，如果您使用此解析HTML，上面概述的方法只是一种快速而肮脏，快速且宽松的，从臀部开始的链接提取方式。很容易构造有效的输入，这会产生大量的误报，而且也很难构造产生漏报的输入。

相比之下，这是一个完整的程序，它会在其URL参数中转储所有<a ...>和<img ...>链接地址，并且实际上是正确的，因为它使用了真正的解析器。

#!/usr/bin/env perl # # fetchlinks - fetch all <a> and <img> links from listed URL args # Tom Christiansen <tchrist@perl.com> # Wed Mar 14 08:03:53 MDT 2012 # use strict; use warnings; use LWP::UserAgent; use HTML::LinkExtor; use URI::URL; die "usage: $0 url ...\n" unless @ARGV; for my $arg (@ARGV) { my @links = fetch_wanted_links($arg => qw<a img>); for my $link (@links) { print "$arg => " if @ARGV > 1; print "$link\n"; } } exit; sub fetch_wanted_links { my($url, @wanted) = @_; my %wanted; @wanted{@wanted} = (1) x @wanted; my $agent = LWP::UserAgent->new; # Set up a callback that collect links of the wanted variety my @hits = (); # Make the parser. Unfortunately, we don't know the base yet # (it might be different from $url) my $parser = new HTML::LinkExtor sub { my($tag, %attr) = @_; return if %wanted and not $wanted{$tag}; push @hits, values %attr; }; # Request document and parse it as it arrives my $response = $agent->request( HTTP::Request->new(GET => $url), sub { $parser->parse( $_[0] ) }, ); # Expand all image URLs to absolute ones my $base = $response->base; @hits = map { $_ = url($_, $base)->abs } @hits; return @hits; }

如果你在这样的URL上运行它，它会对所有锚点和图像链接进行计算：

$ perl fetchlinks http://www.perl.org/ http://www.perl.org/ http://st.pimg.net/perlweb/images/camel_head.v25e738a.png http://www.perl.org/ http://www.perl.org/learn.html http://www.perl.org/docs.html http://www.perl.org/cpan.html http://www.perl.org/community.html http://www.perl.org/contribute.html http://www.perl.org/about.html http://www.perl.org/get.html http://www.perl.org/get.html http://www.perl.org/get.html http://www.perl.org/about.html http://www.perl.org/learn.html http://st.pimg.net/perlweb/images/icons/learn.v0e1f83c.png http://www.perl.org/learn.html http://www.perl.org/community.html http://st.pimg.net/perlweb/images/icons/community.v03bf8ce.png http://www.perl.org/community.html http://www.perl.org/docs.html http://st.pimg.net/perlweb/images/icons/docs.v2622a01.png http://www.perl.org/docs.html http://www.perl.org/contribute.html http://st.pimg.net/perlweb/images/icons/cog.v08b9acc.png http://www.perl.org/contribute.html http://www.perl.org/dev.html http://www.perl.org/contribute.html http://www.perl.org/cpan.html http://st.pimg.net/perlweb/images/icons/cpan.vdc5be93.png http://www.perl.org/cpan.html http://www.perl.org/events.html http://st.pimg.net/perlweb/images/icons/cal.v705acef.png http://www.perl.org/events.html http://www.perl6.org/ http://st.pimg.net/perlweb/images/icons/perl6.v8ff6c63.png http://www.perl6.org/ http://www.perl.org/dev.html http://www.perlfoundation.org/ http://st.pimg.net/perlweb/images/icons/onion.vee5cb98.png http://www.perlfoundation.org/ http://www.cpan.org/ http://search.cpan.org/~jtang/Net-Stomp-0.45/ http://search.cpan.org/~vaxman/Array-APX-0.3/ http://search.cpan.org/~salva/Net-SFTP-Foreign-1.71/ http://search.cpan.org/~grandpa/Win32-MSI-HighLevel-1.0008/ http://search.cpan.org/~teejay/Catalyst-TraitFor-Component-ConfigPerSite-0.06/ http://search.cpan.org/~jwieland/WebService-Embedly-0.04/ http://search.cpan.org/~mariab/WWW-TMDB-API0.04/ http://search.cpan.org/~teejay/SOAP-Data-Builder-1/ http://search.cpan.org/~dylan/WWW-Google-Translate-0.03/ http://search.cpan.org/~jtbraun/Parse-RecDescent-1.967_008/ http://www.perl.org/get.html http://www.perl.org/learn.html http://www.perl.org/docs.html http://www.perl.org/community.html http://www.perl.org/events.html http://www.perl.org/siteinfo.html#sponsors http://www.yellowbot.com/ http://st.pimg.net/perlweb/images/friends/yellowbot.vcc29f5b.gif http://www.perl.org/ http://blogs.perl.org/ http://jobs.perl.org/ http://learn.perl.org/ http://dev.perl.org/ http://creativecommons.org/licenses/by-nc-nd/3.0/us/ http://i.creativecommons.org/l/by-nc-nd/3.0/us/80x15.png http://www.perl.org/siteinfo.html

对于任何严肃的工作而不是在文件上快速grep运行以获得一般结果，你需要使用适当的解析器来完成这类工作。

Answer 2

尝试类似/http:\/\/(?!mysite.com)/（perl样式，也可以在javascript中工作）

使用Regex搜索包含“http：//”且不包含“mysite.com”的字符串

2 个答案:

警告的

快速回答

现在，获得真正的答案

使用Regex搜索包含“http：//”且不包含“mysite.com”的字符串

2 个答案:

警告 的

快速回答

现在，获得真正的答案

警告的