我对此很新。我想使用PHP从页面中提取一个表,并在修改所有锚点的HREF值后返回它的HTML。 这是表格:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1255">
<link rel="stylesheet" type="text/css" href="../CssGraduateE.css">
<title></title>
</head>
<body>
<div>
<br>
<table class="main" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td>
<br><span class="MainHeader">Subjects in Faculty - Electrical Engineering</span><br><br>
<table cellpadding="2" cellspacing="0" border="1" width="100%">
<tbody>
<tr>
<td><span class="SecondHeader"> Subject Number</span></td>
<td><span class="SecondHeader">Subject Name</span></td>
<td><span class="SecondHeader">Points</span></td>
<td><span class="SecondHeader">Semesters</span></td>
<td>Subject Site</td>
</tr>
<tr>
<td><a href="../Subjects/?SUB=46001">46001</a> </td>
<td nowrap="">Engineering of Distributed Software Sys</td>
<td>3</td>
<td><br></td>
<td><a target="_newtab" href="http://www.thislinkisok.com/courses/046001">www</a></td>
</tr>
<tr>
<td><a href="../Subjects/?SUB=46002">46002</a> </td>
<td nowrap="">Design and Analysis of Algorithms</td>
<td>3</td>
<td>B<br></td>
<td> <br></td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
<br>
<table border="0">
<tbody>
<tr>
<td>Last Update on :</td>
<td>Wednesday ,9 April 2014</td>
<td></td>
</tr>
</tbody>
</table>
</div>
</body>
</html>
我知道如何抓住我想要的桌子: $ query = $ xpath-&gt; query('// table [@ class =“main”] // table [1]'); 但是如何循环遍历以“../xxx”开头的所有链接并将其修改为这样的内容:“www.mynewlink.com/xxx”? 最后,我想将提取的表作为HTML返回。如何使用本机DOMDocument和DOMXpath执行此操作?
全部谢谢!
答案 0 :(得分:1)
如果$html
是您从外部网站获取HTML的字符串,则可以执行以下操作:
$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//table[@class="main"]//a[starts-with(@href, "../")]') as $link) {
$link->setAttribute('href', preg_replace('#^..#', 'http://www.mynewlink.com', $link->getAttribute('href')));
}
$container = new DOMDocument();
$container->appendChild($container->importNode($xpath->query('//table[@class="main"]')->item(0), true));
echo $container->saveHTML();