有人可以指导我如何使用DOMDocument()从表单中提取内容; 。我能够提取所有链接,即../index.html,descriptions/page001等,并将提取的数据保存到mysql数据库,但我仍然坚持如何获取内容,即会计,成人继续教育等,并将信息保存到数据库中。
<HTML>
<HEAD></HEAD>
<BODY>
<FORM ACTION="#">
<SELECT ONCHANGE="MM_jumpMenu('parent',this,0)" NAME="menu1">
<OPTION VALUE="../index.html" SELECTED="SELECTED"></OPTION>
<OPTION VALUE="descriptions/page001.html">Accounting</OPTION>
<OPTION VALUE="descriptions/page122.html">Adult Continuing Education</OPTION>
<OPTION VALUE="descriptions/page115.html">Energy Engineering</OPTION>
</SELECT>
</P></FORM>
</BODY>
</HTML>
MY CURL SCRIPT
// parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);
// grab all on the page
$xpath = new DOMXPath($dom);
// GET AND LOOP THROUGH LINKS
$values = $xpath->evaluate("/html/body//option");
for ($cnt = 0; $cnt < $values->length; $cnt++) {
$value = $values->item($cnt);
$url = $value->getAttribute('value');
//store extracted links and links source into the database function
storeLink($url,$target_url);
echo "Link stored: $url";
}
任何帮助都将不胜感激。感谢。
答案 0 :(得分:0)
以下是解决方案:
$html = '<HTML>
<HEAD></HEAD>
<BODY>
<FORM ACTION="#">
<SELECT ONCHANGE="MM_jumpMenu(\'parent\',this,0)" NAME="menu1">
<OPTION VALUE="../index.html" SELECTED="SELECTED"></OPTION>
<OPTION VALUE="descriptions/page001.html">Accounting</OPTION>
<OPTION VALUE="descriptions/page122.html">Adult Continuing Education</OPTION>
<OPTION VALUE="descriptions/page115.html">Energy Engineering</OPTION>
</SELECT>
</P></FORM>
</BODY>
</HTML>';
$document = new DOMDocument();
$document->loadHTML($html);
$options = $document->getElementsByTagName('option');
foreach ($options as $option) {
echo $option->getAttribute('value');
echo "\n";
}
答案 1 :(得分:0)
对于标签之间的值,例如会计:
<OPTION VALUE="descriptions/page001.html">Accounting</OPTION>
您需要->nodeValue
...
$options = $document->getElementsByTagName('option');
foreach ($options as $option) {
storeLink($option->getAttribute('value'), $option->nodeValue);
}