我的网页上有以下 HTML :
<p>This is a <a href="http://www.google.com/">hyperlink</a> and this is another <a href="http://www.bing.com/">hyperlink</a>. There are many like it, but <a href="http://en.wikipedia.org/wiki/Full_Metal_Jacket">this one is mine</a>.</p>
现在,我想知道......
有什么办法,我可以使用PHP函数将这个文本块拆分成一个数组吗?
$html[0] = "<p>This is a & this is another . There are many like it, but .</p>";
$html[1] = "http://www.google.com/";
$html[2] = "http://www.bing.com/";
$html[3] = "http://en.wikipedia.org/wiki/Full_Metal_Jacket";
因此,基本上剥离所有超链接的初始文本块并将它们全部存储在它们自己的数组元素中。
非常感谢您对此的任何帮助。
答案 0 :(得分:1)
使用此RegEx获取html的URL:
$url = "http://www.example.net/somepage.html";
$input = @file_get_contents($url) or die("Could not access file: $url");
$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";
if(preg_match_all("/$regexp/siU", $input, $matches)) {
// $matches[2] = array of link addresses
// $matches[3] = array of link text - including HTML code
}
?>