我有html内容,如下所示......
<table>
<tr>
<td>xyx...</td>
<td>abc....</td>
<td><span><h3>Downloads</h3></span><br>blah blah blah...</td>
</tr>
<tr>
<td><h3>Downloads</h3>again some content.</td>
<td>dddd</td>
<td>kkkl...</td>
</tr>
</table>
现在我试图删除'td's,如果内容中的任何地方都有'下载'这个词。经过对互联网的一些研究,我可以得到一些执行的东西,代码如下......
$res_text = 'MY HTML';
# Create a DOM parser object
$dom = new DOMDocument();
# Parse the HTML from Google.
# The @ before the method call suppresses any warnings that
# loadHTML might throw because of invalid HTML in the page.
@$dom->loadHTML($res_text);
$selector = new DOMXPath($dom);
$results = $selector->query('//*[text()[contains(.,"Downloads")]]');
if($results->length){
foreach($results as $res){
$res->parentNode->removeChild($res);
}
}
这会删除“下载”一词及其当前的父节点<span>
或<p>
,但我希望整个<td>
应与内容一起删除。
我试过......
$results = $selector->query('//td[text()[contains(.,"Downloads")]]');
但它不起作用。有人可以告诉我怎样才能得到它?
答案 0 :(得分:2)
您的查询中不需要text()
,它应该是:
$results = $selector->query('//td[contains(.,"Downloads")]');
整个代码:
$dom = new DOMDocument();
$dom->loadHTML($res_text);
$selector = new DOMXPath($dom);
$results = $selector->query('//td[contains(.,"Downloads")]');
if($results->length){
foreach($results as $res){
$res->parentNode->removeChild($res);
}
}
echo htmlentities($dom->saveHTML());
<强> DEMO 强>