在<a> tag with regular expression</a>中换行链接

时间:2011-03-04 15:42:47

标签: php regex

我需要在文本中使用带有正则表达式的“a”标记包装文本中的所有链接,除了已经包装的那些

所以我有文字:

Some text with html here
http://www.somelink.html
http://www.somelink.com/view/?id=95
<a href="http://anotherlink.html">http://anotherlink.html</a>
<a href="http://anotherlink.html">Title</a>

我需要得到什么:

http://www.somelink.html

我可以使用此表达式匹配链接:

http://www.somelink.com/view/?id=95

但它也匹配已经在“a”标签中的那个

3 个答案:

答案 0 :(得分:3)

您将使用negative lookbehind。语法是:

(?<!text)

所以在你的情况下,它将是:

(?<!\<a)

或接近上述内容。

答案 1 :(得分:2)

为了可靠性,我会将<a>标签(包括儿童内容)与其他标签(不包括儿童内容)分开,如:

$bits = preg_split('/(<a(?:\s+[^>]*)?>.*?<\/a>|<[a-z][^>]*>)/is', $content, null, PREG_SPLIT_DELIM_CAPTURE);

$reconstructed = '';

foreach ($bits as $bit) {
  if (strpos($bit, '<') !== 0) {//not inside an <a> or within < and > so check for urls
    $bit = link_urls($bit);
  }
  $reconstructed .= $bit;
}

答案 2 :(得分:0)

正则表达式和替换的主旨在下面(在perl中)。应该够了。

use strict;
use warnings;

my $html = '
  http://Top.html

  Some text with more html here
  <a href="http://www.somelink.html">
        http://www.somelink.html
  </a>

  <a href="http://www.somelink.com/view/?id=2495">
       http://www.somelink.com/view/?id=95
  </a>

  <a href="http://anotherlink.html">
       http://anotherlink.html
  </a>

  http://andone.html
  http://andtwo.html

  <a href="http://anthisisotherlink.html"><mn>
       Title
     http://this  <br>
       <b href="http://erlink.html">
     asdf
  </a> 
';

{
 no warnings;
 $html =~ 

 # Regex (global relace) ..
  s{(?is)
      (<   (?:DOCTYPE.*?|--.*?--)
         | script\s[^>]*>.*?</script\s*
         | style\s[^>]*>.*?</style\s*
         | a\s[^>]*>.*?</a\s*
         | (?:/?\w+\s*/?|(?:\w+\s+(?:".*?"|'.*?'|[^>]*?)+\s*/?))
        >
      )
    | ( (?:
         (?!(?:(?:https?|ftp)://|www.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|])
         [^<]
        )*?
      )
    | ( (?:(?:https?|ftp)://|www.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|] )
  }

 # Replacement (would be a callback function in php) ..
  {
     defined $3 ? "<a href=\"$3\">$3</a>" : "$1$2"
  }xeg;
}

print $html,"\n";

输出

  <a href="http://Top.html">http://Top.html</a>

  Some text with more html here
  <a href="http://www.somelink.html">
        <a href="http://www.somelink.html">http://www.somelink.html</a>
  </a>

  <a href="http://www.somelink.com/view/?id=2495">
       http://www.somelink.com/view/?id=95
  </a>

  <a href="http://anotherlink.html">
       http://anotherlink.html
  </a>

  <a href="http://andone.html">http://andone.html</a>
  <a href="http://andtwo.html">http://andtwo.html</a>

  <a href="http://anthisisotherlink.html"><mn>
       Title
     http://this  <br>
       <b href="http://erlink.html">
     asdf
  </a>