Question

<a href="http://corporate.mattel.com/privacy-policy.aspx" class="privacy">
    <b>Primary Text</b> Secondary Text
</a>

我需要标记文本的文本，输出应该是＆＃34; 主要文本辅助文本＆＃34;。

请帮助构建正则表达式以实现此目的。

目前，我正在使用以下正则表达式： -

$regex = "/<a[\s]+[^>]*?href[\s]?=[\s\"\']+"."(.*?)[\"\']+.*?>"."([^<]+|.*?)?<\/a>/";

这个正则表达式为我提供了正确的输出： -

<a href="http://corporate.mattel.com/privacy-policy.aspx" class="privacy">
    Primary Text
</a>

Answer 1

你不应该用正则表达式解析html，而是使用php DOM Parser。
要删除b标记，请使用strip_tags，即。;

$html = file_get_contents("http://www.website.php");
/* 
OR 
$html = '<a href="http://corporate.mattel.com/privacy-policy.aspx" class="privacy">
    <b>Primary Text</b> Secondary Text
</a>';
*/
# Create a DOM parser object
$dom = new DomDocument();
@$dom->loadHTML($html);
$urls = $dom->getElementsByTagName('a');

foreach ($urls as $url) {
   $url->nodeValue = strip_tags($url->nodeValue);
} 
echo $dom->saveHTML();

php：获取<a> tag having another tag inside it</a>的文字

1 个答案: