从CURL中的表单中提取数据

时间:2012-06-12 18:34:45

标签: php mysql curl

有人可以指导我如何使用DOMDocument()从表单中提取内容; 。我能够提取所有链接,即../index.html,descriptions/page001等,并将提取的数据保存到mysql数据库,但我仍然坚持如何获取内容,即会计,成人继续教育等,并将信息保存到数据库中。

<HTML>
<HEAD></HEAD>
<BODY>
<FORM ACTION="#">
<SELECT ONCHANGE="MM_jumpMenu('parent',this,0)" NAME="menu1"> 
<OPTION VALUE="../index.html" SELECTED="SELECTED"></OPTION> 
<OPTION VALUE="descriptions/page001.html">Accounting</OPTION> 
<OPTION VALUE="descriptions/page122.html">Adult Continuing Education</OPTION>
<OPTION VALUE="descriptions/page115.html">Energy Engineering</OPTION> 
</SELECT>
</P></FORM> 
</BODY>
</HTML>


MY CURL SCRIPT
// parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);

// grab all on the page
$xpath = new DOMXPath($dom);


// GET AND LOOP THROUGH LINKS
$values = $xpath->evaluate("/html/body//option");
for ($cnt = 0; $cnt < $values->length; $cnt++) {
$value = $values->item($cnt);
$url = $value->getAttribute('value');
    //store extracted links and links source into the database function
storeLink($url,$target_url);
echo "Link stored: $url";
}

任何帮助都将不胜感激。感谢。

2 个答案:

答案 0 :(得分:0)

以下是解决方案:

$html = '<HTML>
  <HEAD></HEAD>
  <BODY>
  <FORM ACTION="#">
  <SELECT ONCHANGE="MM_jumpMenu(\'parent\',this,0)" NAME="menu1"> 
  <OPTION VALUE="../index.html" SELECTED="SELECTED"></OPTION> 
  <OPTION VALUE="descriptions/page001.html">Accounting</OPTION> 
  <OPTION VALUE="descriptions/page122.html">Adult Continuing Education</OPTION>
  <OPTION VALUE="descriptions/page115.html">Energy Engineering</OPTION> 
  </SELECT>
  </P></FORM> 
  </BODY>
  </HTML>';

$document = new DOMDocument();
$document->loadHTML($html);
$options = $document->getElementsByTagName('option');

foreach ($options as $option) {
  echo $option->getAttribute('value');
  echo "\n";
}

答案 1 :(得分:0)

对于标签之间的值,例如会计:

<OPTION VALUE="descriptions/page001.html">Accounting</OPTION>

您需要->nodeValue

...
$options = $document->getElementsByTagName('option');

foreach ($options as $option) {
  storeLink($option->getAttribute('value'), $option->nodeValue);
}