我有一个场景,我需要从HTML内容中删除所有锚点,但是这样做时我不想剥离锚点标记的href
部分。
目前我正在使用此正则表达式使用preg_replace()
去除锚点。
<a [^>]*> strips all the anchor tag
<a.+href\=[\"|\'](.+)[\"|\'].*\>.*\<\/a\> - matches href
示例字符串: &#34; anchor href =&#34; mailto:xyz@gmail.com"&gt; namemail anchor&#34;
在做了preg_replace()之后,我应该得到&#34; mailto:xyz@gmail.com"字符串作为文本休息都应该被删除。
答案 0 :(得分:1)
试试这个正则表达式:
~<a.+?href=(["'])(.+?)\1.*?>.*?</a>~is
~<a.+?href=(["'])(.+?)\1.*?>.*?</a>~is
<a # matches the characters <a literally (case sensitive)
.+? # matches any character, the least possible
href= # matches the characters href= literally (case sensitive)
1st Capturing group (["'])
["'] # matches a single character. Either " or '
2nd Capturing group (.+?)
.+? # matches any character, the least possible
\1 # matches a single character corresponding the character found in first capturing group.
.*? # matches zero or more characters, the least possible
> # matches the character > literally
.*? # matches zero or more characters, the least possible
</a> # matches the characters </a> literally (case sensitive)
i modifier: ignore case
s modifier: single line. Dot matches newline characters
NOTA: The ~ between the regex delimit it and allow us to don't escape /.
[\"|\']
不要超越你的逃生。只有在要明确匹配元字符时才转义元字符。请改用["|']
。
["|']
除非你想匹配它,否则不要在字符类中使用|
。字符类中的字符已经OR
编辑。请查看以下说明:
当您键入["|']
时,正则表达式会看到:
当您键入["']
时,正则表达式会看到:
答案 1 :(得分:1)
$html = '<a href="http://www..." x=asdasda?></a>';
$html = preg_replace("|<a[^>]*href\s*=\s*([\"'])([^\"']*)\\1[^>]*>[^<]*</a>|si", "$2", $html);
输出:
http://www...
答案 2 :(得分:1)
通过使用DOMDocument解析HTML而不是尝试使用正则表达式,您将获得更大的成功:
以下是可以应该做的概念验证:
function replaceAnchorTags($html) {
//Intialise document using provided HTML
$doc = new DOMDocument();
@$doc->loadHTML($html); //suppress invalid HTML warnings
$doc_elem = $doc->documentElement;
traverse($doc, $doc_elem);
return $doc->saveHTML();
}
function traverse(&$doc, $elem) {
if ($elem->nodeType === XML_ELEMENT_NODE and $elem->tagName == "a") {
$href = $elem->getAttribute("href");
// Obviously here you might want to keep the anchor's inner HTML as
// well as the URL...
$text_replacement = $doc->createTextNode($href);
$elem->parentNode->replaceChild($text_replacement, $elem);
}
if ($elem->hasChildNodes()) {
$children = $elem->childNodes;
for ($i=0, $max=$children->length; $i<$max; $i++) {
traverse($doc, $children->item($i));
}
}
}
$html = "<p>Hello <a href='http://twitter.com'>Brave New</a> World</p>";
echo replaceAnchorTags($html);