使用DOMXPath PHP进行文本搜索

时间:2014-04-05 23:36:04

标签: php dom curl xpath

html

<td class="one">
  <div>
    <b>
      <span>item</span>
    </b>
    <div>
      <c>text</c>
    </div>
  </div>
</td>


如何通过搜索文字来选择并回显

我在PHP中使用xpath行时遇到了困难。

$c = $xpath->query("*/c");


PHP

<?php
$keyword = "String";
$search = strtolower($keyword);

$target_url = "http://www.example.com/";

//USER AGENT
//$userAgent = 'spider';
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';

$ch = curl_init();
$options = array(CURLOPT_USERAGENT   => $userAgent,
                CURLOPT_URL             => $target_url,
                CURLOPT_HEADER          => false,
                CURLOPT_FAILONERROR     => true,
                CURLOPT_FOLLOWLOCATION  => true,
                CURLOPT_AUTOREFERER     => true,
                CURLOPT_RETURNTRANSFER  => true,
                CURLOPT_TIMEOUT         => 20
                );

curl_setopt_array($ch, $options);
$html= curl_exec($ch);

if (!$html)
{
    echo "ERROR NUMBER: ".curl_errno($ch);
    echo "ERROR: ".curl_error($ch);
    exit;
}
curl_close($ch);


$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$c = $xpath->query("*/c");


foreach($c as $a) { 
    $text = $a->nodeValue;
    echo($text . '<br />');
}


//echo '<pre>';
//print_r($c);
//echo '</pre>';    
?>

1 个答案:

答案 0 :(得分:1)

HTML defines no c element起,除非您同时提供DOMDocument::loadHTML,否则您将无法使用LIBXML_HTML_NOIMPLIED constant

$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED);

这将设置一个合适的libxml标志,以允许您遍历文档而无需检查元素。