我需要在文本中使用带有正则表达式的“a”标记包装文本中的所有链接,除了已经包装的那些
所以我有文字:
Some text with html here
http://www.somelink.html
http://www.somelink.com/view/?id=95
<a href="http://anotherlink.html">http://anotherlink.html</a>
<a href="http://anotherlink.html">Title</a>
我需要得到什么:
http://www.somelink.html
我可以使用此表达式匹配链接:
http://www.somelink.com/view/?id=95
但它也匹配已经在“a”标签中的那个
答案 0 :(得分:3)
答案 1 :(得分:2)
为了可靠性,我会将<a>
标签(包括儿童内容)与其他标签(不包括儿童内容)分开,如:
$bits = preg_split('/(<a(?:\s+[^>]*)?>.*?<\/a>|<[a-z][^>]*>)/is', $content, null, PREG_SPLIT_DELIM_CAPTURE);
$reconstructed = '';
foreach ($bits as $bit) {
if (strpos($bit, '<') !== 0) {//not inside an <a> or within < and > so check for urls
$bit = link_urls($bit);
}
$reconstructed .= $bit;
}
答案 2 :(得分:0)
正则表达式和替换的主旨在下面(在perl中)。应该够了。
use strict;
use warnings;
my $html = '
http://Top.html
Some text with more html here
<a href="http://www.somelink.html">
http://www.somelink.html
</a>
<a href="http://www.somelink.com/view/?id=2495">
http://www.somelink.com/view/?id=95
</a>
<a href="http://anotherlink.html">
http://anotherlink.html
</a>
http://andone.html
http://andtwo.html
<a href="http://anthisisotherlink.html"><mn>
Title
http://this <br>
<b href="http://erlink.html">
asdf
</a>
';
{
no warnings;
$html =~
# Regex (global relace) ..
s{(?is)
(< (?:DOCTYPE.*?|--.*?--)
| script\s[^>]*>.*?</script\s*
| style\s[^>]*>.*?</style\s*
| a\s[^>]*>.*?</a\s*
| (?:/?\w+\s*/?|(?:\w+\s+(?:".*?"|'.*?'|[^>]*?)+\s*/?))
>
)
| ( (?:
(?!(?:(?:https?|ftp)://|www.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|])
[^<]
)*?
)
| ( (?:(?:https?|ftp)://|www.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|] )
}
# Replacement (would be a callback function in php) ..
{
defined $3 ? "<a href=\"$3\">$3</a>" : "$1$2"
}xeg;
}
print $html,"\n";
输出
<a href="http://Top.html">http://Top.html</a>
Some text with more html here
<a href="http://www.somelink.html">
<a href="http://www.somelink.html">http://www.somelink.html</a>
</a>
<a href="http://www.somelink.com/view/?id=2495">
http://www.somelink.com/view/?id=95
</a>
<a href="http://anotherlink.html">
http://anotherlink.html
</a>
<a href="http://andone.html">http://andone.html</a>
<a href="http://andtwo.html">http://andtwo.html</a>
<a href="http://anthisisotherlink.html"><mn>
Title
http://this <br>
<b href="http://erlink.html">
asdf
</a>