SimpleXML-> xpath问题

时间:2011-05-08 14:01:16

标签: php xml xpath curl simplexml

我正在尝试访问每个表格行:

  

http://www.alliedelec.com/search/searchresults.aspx?N=0&Ntt=PIC16F648&Ntk=Primary&i=0&sw=n

使用SimpleXML-> xpath。我已经确定了表的xpath:

'//*[@id="tblParts"]'

现在我接受我的cURL字符串$ string并执行以下操作:

$tidy->parseString($string);
$output = (string) $tidy;
$xml = new SimpleXMLElement($output);
$result = $xml->xpath('//*[@id="tblParts"]');
while(list( , $node) = each($result)) 
{
echo 'NODE:' . $node . "\n";
}

我得到的是这些错误,数以百计:

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: Entity: line 60: parser error : Opening and ending tag mismatch: meta line 22 and head in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: </head> in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: ^ in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]: Entity: line 108: parser error : Opening and ending tag mismatch: img line 106 and td in C:\xampp\htdocs\elexess\api\driver\driver_alliedelectronics.php on line 119

最后还有这个:

Fatal error: Uncaught exception 'Exception' with message 'String could not be parsed as XML' in C:\xampp\htdocs\app\com\get\get_alliedelectronics.php:119 Stack trace: #0 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(119): SimpleXMLElement->__construct('<!DOCTYPE html ...') #1 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(95): get_Alliedelectronics->extractData('<!DOCTYPE html ...') #2 C:\xampp\htdocs\app\com\get\get_alliedelectronics.php(138): get_Alliedelectronics->query('PIC16F648') #3 {main} thrown in C:\xampp\htdocs\app\com\get\get_alliedelectronics.php on line 119

2 个答案:

答案 0 :(得分:2)

看起来您正在抓取并尝试解析的页面的HTML格式不正确(标签不匹配等)。

您可以使用simplexml_import_dom尝试修复错误,我在this SO post中进行了解释。

答案 1 :(得分:1)

我建议不要使用SimpleXML(@Nev Stokes和@Nicholas Wilson是对的:这是html,而不是XML,你无法保证它将验证为XML)并使用类似DOM的东西(参见{{3} })。你可以这样做:

$doc = new DOMDocument();
$doc->loadHTML($string);
$xpath = new DOMXPath($doc);
$entries = $xpath->query('//*[@id="tblParts"]');
foreach ($entries as $entry) {
  // do something
}

看看是否有帮助。