Question

我正试图让它与perl的正则表达式一起工作，但似乎无法弄清楚..我想抓住任何有“.website”的网址。在其中，除了像这样的（“en”在“.website。”之前。

   $linkhtml =  'http://en.search.website.com/?q=beach&' ;

这是我希望由正则表达式返回的网址示例，而上面的网址被拒绝

   $linkhtml =  ' http://exsample.website.com/?q=beach&' ;

以下是我的尝试......对于我做错了什么建议表示赞赏

   $re2='(?<!en)'; # Any number of characters
   $re4='(.*)'; # Any number of characters
   $re6='(\.)'; # Any Single Character 4
   $re7='(website)'; # Word 2
   $re8='(\.)'; # Any Single Character 5
   $re9='(.*)'; # Any number of characters

   $re=$re4.$re2.$re6.$re7.$re8.$re9;

   if ($linkhtml =~ /$re/)

Answer 1

我只需要分两步完成：首先使用通用正则表达式来检查任何URL（或者更确切地说，看起来像URL的任何内容）。然后在en之前检查与查找主机中发生wordpress的另一个正则表达式匹配的每个结果，并丢弃匹配的任何内容。

Answer 2

如果在断言后尝试匹配的内容非常普遍，以至于它与断言本身匹配，那么负面的后观断言就不能正常工作。考虑：

perl -wle'print "en.website" =~ qr/(?<!en\.)web/' # doesn't match perl -wle'print "en.website" =~ qr/(?<!en\.)[a-z]/' # does match, because [a-z] is matching the 'en'

这里最好的做法是大卫建议：使用两种模式来筛选好的和坏的价值观：

my @matches = grep { /$pattern1/ and not /$pattern2/ } @strings;

...其中pattern1匹配所有网址，而pattern2仅匹配“en”网址。

Answer 3

这是最终的解决方案，万一有人在将来遇到这个正则表达式的新手（就像我一样）并且有类似的问题..在我的情况下我把它包装成一个“for循环”所以它会去通过数组但它只取决于需要。

首先过滤掉具有“en”的网址，因为这些网址不是我们想要的网址

        $re1='(.*)';    # Any number of characters
        $re2='(en)';    # Word 1
        $re3='(.*)'; # Any number of characters


        $re=$re1.$re2.$re3;
        if ($linkhtml =~ /$re/)
        {


    #do nothing, as we don't want a link with "en" in it

        }

        else {

        ### find urls with ".website."
        $re1='(.*)';    # Any number of characters
        $re2='(\.)';    # period
        $re3='(website)';   # Word 1
        $re4='(\.)';    # period
        $re5='(.*)'; # Any number of characters


        $re=$re1.$re2.$re3.$re4.$re5;

            if ($linkhtml =~ /$re/) {

            #match to see if it is a link that has ".website." in it


            ## do something with the data as it matches, such as:
                       print "linkhtml

            }

           }

使用负面外观的Perl正则表达式？似乎无法弄清楚如何正确地做到这一点

3 个答案:

首先过滤掉具有“en”的网址，因为这些网址不是我们想要的网址