我正在尝试原型化PHP脚本,该脚本可以从HTML页面中提取数据。到目前为止,它适用于不需要身份验证的html页面。但是如何从需要用户首先登录的页面中检索内容?
以下是我目前的代码:
<?php
$url="http://anandtech.com";
$html = file_get_contents($url);
$doc = new DOMDocument();
@$doc->loadHTML($html);
$xml=simplexml_import_dom($doc);
$items = $xml->xpath("/html/body/section[@class='content']/section[@class='main_cont']/div[@class='pipeline']/div[@class='pipeline_cont']/ul[1]/li[@class='hide_resp']/a[1]/span[text()]");
echo '<ul>';
foreach ($items as $item) {
echo '<li>' . $item . '</li>';
}
echo '</ul>';
?>
答案 0 :(得分:0)
如果您的意思是HTTP身份验证,您可以使用curl_init()
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL,$url); // set url to post to
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($ch, CURLOPT_TIMEOUT, 4); // times out after 4s
curl_setopt($ch, CURLOPT_PORT, $port);
curl_setopt($ch, CURLOPT_USERPWD, 'username:password'); // add POST fields
$result = curl_exec($ch);
或者您可以通过
发布获取/发布值 $ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL,$url); // set url to post to
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($ch, CURLOPT_TIMEOUT, 4); // times out after 4s
curl_setopt($ch, CURLOPT_PORT, $port);
curl_setopt($ch, CURLOPT_POSTFIELDS, 'login='.$username); // add POST fields
$result = curl_exec($ch);