解析HTML并删除特定的td

时间:2014-05-19 04:55:31

标签: php dom xpath

我有html内容,如下所示......

<table>
  <tr>
    <td>xyx...</td>
    <td>abc....</td>
    <td><span><h3>Downloads</h3></span><br>blah blah blah...</td>
  </tr>
  <tr>
    <td><h3>Downloads</h3>again some content.</td>
    <td>dddd</td>
    <td>kkkl...</td>
  </tr>
</table>

现在我试图删除'td's,如果内容中的任何地方都有'下载'这个词。经过对互联网的一些研究,我可以得到一些执行的东西,代码如下......

$res_text = 'MY HTML';

# Create a DOM parser object
$dom = new DOMDocument();

# Parse the HTML from Google.
# The @ before the method call suppresses any warnings that
# loadHTML might throw because of invalid HTML in the page.
@$dom->loadHTML($res_text);         

$selector = new DOMXPath($dom);


$results = $selector->query('//*[text()[contains(.,"Downloads")]]');

if($results->length){
    foreach($results as $res){
        $res->parentNode->removeChild($res);
    }
}

这会删除“下载”一词及其当前的父节点<span><p>,但我希望整个<td>应与内容一起删除。

我试过......

$results = $selector->query('//td[text()[contains(.,"Downloads")]]');

但它不起作用。有人可以告诉我怎样才能得到它?

1 个答案:

答案 0 :(得分:2)

您的查询中不需要text(),它应该是:

$results = $selector->query('//td[contains(.,"Downloads")]');

整个代码:

$dom = new DOMDocument();
$dom->loadHTML($res_text);
$selector = new DOMXPath($dom);
$results = $selector->query('//td[contains(.,"Downloads")]');
if($results->length){
   foreach($results as $res){
           $res->parentNode->removeChild($res);
    }
}

echo htmlentities($dom->saveHTML());

<强> DEMO