我正在尝试加载一个简单的HTML字符串(无论HTML-tidy如何)都不允许DOMDocument访问。
这是实例化
$doc = new DOMDocument(/*'1.0', 'utf-8'*/);
$doc->recover = true;
$doc->strictErrorChecking = false;
$doc->formatOutput = true;
$doc->load($content);
$node_array = $doc->getElementsByTagName("body");
print_r( $node_array)
...或$node_array->items(0);
我明白了:
DOMNodeList Object
(
)
DOMDocument使用函数save返回字符串 它不是资源。可能是缺少依赖项,其他PHP配置......?
更新: DOMDocument的对象根本没有实现任何tostring转换函数:
print_r( (string)$node_array );
类DOMNodeList的对象无法转换为....
中的字符串HTML代码在这里: http://pastebin.com/11V92Dup(故意格式错误 - 这是在代码中证明'整洁'正确关闭标签)
我想简单地走节点并输出它们的内容:
$node_array = $doc->getElementsByTagName("html");//parent_node();
$x = $doc->documentElement;
foreach ($x->childNodes AS $item)
{
print $item->nodeName . " = " . $item->nodeValue . "<br />";
}
更新2:我收到了这个结果!这没有意义。 (所有的空格都来自哪里?)
body =
COMPOUND: C05441
答案 0 :(得分:0)
我不太清楚你对答案的期望。无论如何我会试一试。这里有一些代码以递归方式迭代HTML树并输出每个元素的textContent值。
<?php
$contents = <<<HTML
<html><head>
<title>KEGG COMPOUND: C05441</title>
<link type="text/css" rel="stylesheet" href="/css/gn2.css">
<link rel="stylesheet" href="/css/bget.css" type="text/css">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="Content-Style-Type" content="text/css">
<meta http-equiv="Content-Script-Type" content="text/javascript">
</head>
<body onload="window.focus();init();" bgcolor="#ffffff">
<table border=0 cellpadding=0 cellspacing=0><tr><td>
<table border="0" cellspacing="0" cellpadding="0" width="100%"><tr><td width="70"><a href="/kegg/kegg2.html"><img align="middle" border="0" src="/Fig/bget/kegg2.gif" alt="KEGG"></a></td><td> </td><td><a name="compound:C05441"></a><font class="title2">COMPOUND: C05441</font></td><td align="right" valign="bottom"><a href="javascript:void(window.open('/kegg/document/help_bget_compound.html','KEGG_Help','toolbar=no,location=no,directories=no,width=720,height=640,resizable=yes,scrollbars=yes'))"><img onmouseup="btn(this,'Hb')" align="middle" onmouseout="btn(this,'Hb')" onmousedown="btn(this,'Hbd')" onmouseover="btn(this,'Hbh')" alt="Help" name="help" border="0" src="/Fig/bget/button_Hb.gif"></a></td></tr></table>
<form method="post" action="/dbget-bin/www_bget" enctype="application/x-www-form-urlencoded" name="form1">
<table border=0 cellpadding=1 cellspacing=0>
<tr>
<td class="fr2">
<table border=0 cellpadding=2 cellspacing=0 style="border-bottom:#000 1px solid">
</table>
</body></html>
HTML;
$doc = new DOMDocument("1.0", "UTF-8");
$doc->loadHTML($contents);
header("Content-Type: text/plain; charset=utf-8");
function recursivelyEchoChildNodes (DOMElement $parent, $depth = 1) {
foreach ($parent->childNodes as $node) {
if ($node instanceof DOMElement) {
echo str_repeat("-", $depth) . " " . $node->localName . " = " . $node->textContent . "\n";
if ($node->hasChildNodes()) {
recursivelyEchoChildNodes($node, $depth + 1);
}
}
}
}
$html = $doc->getElementsByTagName("html")->item(0);
recursivelyEchoChildNodes($html);