我正在尝试从网页中提取数据以用于进一步的项目。但是网页首先需要登录才能访问下一页。
我尝试使用不同的脚本,因为有很多相关问题,并且可以解决同一问题。这是登录页面的源代码。
登录页面的源代码:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
<title>HPM400</title>
<script type="text/javascript" src="system/js/jquery.js"></script>
<script type="text/javascript" src="system/js/md5.js"></script>
<style type="text/css" media="screen">@import url(style/css/login.css);</style>
<link rel="icon" type="image/png" href="favicon.ico" />
</head>
<body>
<div id="i_header">
<div id="i_hdtext">
High Power Modem 400 MHz
</div>
</div>
<div id="i_body">
<div id="frame_login">
<form>
<fieldset>
<legend>Login</legend>
<table>
<tr>
<td>Username</td><td><input type="textarea" id="id_user"/></td>
</tr>
<td colspan="2"> <br/> </td>
<tr>
<td>Password</td><td><input type="password"id="id_pass"/></td>
</tr>
</table>
<div style="text-align: center" id="submitbutton"><input type="submit" value=Send /></div>
</fieldset>
</form>
</div>
</div>
<div id="i_foot">
</div>
</body>
</html>
<script>
$(document).ready(function() {
$("#submitbutton").click(function() {
var val_user = $("#id_user").val();
var val_pass = $("#id_pass").val();
if(val_user == "" || val_pass == "") {
alert("Please fill all required fields");
} else {
var val_pass_md5 = $.md5(val_pass);
var param = "type=loginreq&user="+val_user+"&pass="+val_pass_md5;
$.ajax({
type : 'POST',
url : 'index.php',
data : param,
success : function(data) {
var tab = data.split(':');
if ( tab[0] == "OK" ) {
window.location.href = 'index.php?page='+tab[1];
if(tab[2].length > 0) {
alert(tab[2]);
}
} else {
loginFailed(tab[1]);
}
},
error : function() {
loginFailed()
}
});
}
return false;
});
});
function loginFailed(p_data) {
alert(p_data);
$("#id_user").val("");
$("#id_user").focus();
$("#id_pass").val("");
}
这是我用来登录上一页并将下一页打印到控制台的代码。
<?php
//The username or email address of the account.
define('USERNAME', 'admin');
//The password of the account.
define('PASSWORD', 'admin');
//Set a user agent. This basically tells the server that we are using Chrome ;)
define('USER_AGENT', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0');
//Where our cookie information will be stored (needed for authentication).
define('COOKIE_FILE', 'cookie.txt');
//URL of the login form.
define('LOGIN_FORM_URL', '192.168.200.1');
//Login action URL. Sometimes, this is the same URL as the login form.
define('LOGIN_ACTION_URL', '192.168.200.1/index.php');
//An associative array that represents the required form fields.
//You will need to change the keys / index names to match the name of the form
//fields.
$postValues = array(
'Username' => USERNAME,
'Password' => PASSWORD
);
//Initiate cURL.
$curl = curl_init();
//Set the URL that we want to send our POST request to. In this
//case, it's the action URL of the login form.
curl_setopt($curl, CURLOPT_URL, LOGIN_ACTION_URL);
//Tell cURL that we want to carry out a POST request.
curl_setopt($curl, CURLOPT_POST, true);
//Set our post fields / date (from the array above).
curl_setopt($curl, CURLOPT_POSTFIELDS, http_build_query($postValues));
//We don't want any HTTPS errors.
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
//Where our cookie details are saved. This is typically required
//for authentication, as the session ID is usually saved in the cookie file.
curl_setopt($curl, CURLOPT_COOKIEJAR, COOKIE_FILE);
//Sets the user agent. Some websites will attempt to block bot user agents.
//Hence the reason I gave it a Chrome user agent.
curl_setopt($curl, CURLOPT_USERAGENT, USER_AGENT);
//Tells cURL to return the output once the request has been executed.
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
//Allows us to set the referer header. In this particular case, we are
//fooling the server into thinking that we were referred by the login form.
curl_setopt($curl, CURLOPT_REFERER, LOGIN_FORM_URL);
//Do we want to follow any redirects?
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, false);
//Execute the login request.
curl_exec($curl);
//Check for errors!
if(curl_errno($curl)){
throw new Exception(curl_error($curl));
}
//We should be logged in by now. Let's attempt to access a password protected page
curl_setopt($curl, CURLOPT_URL, '192.168.200.1/index.php?page=lte');
//Use the same cookie file.
curl_setopt($curl, CURLOPT_COOKIEJAR, COOKIE_FILE);
//Use the same user agent, just in case it is used by the server for session validation.
curl_setopt($curl, CURLOPT_USERAGENT, USER_AGENT);
//We don't want any HTTPS / SSL errors.
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
//Execute the GET request and print out the result.
echo curl_exec($curl);
?>
但是当我运行代码以在登录页面后获取数据时。我没有从下一页获取数据,而是在控制台中获取登录页面的html代码。任何人都可以建议我做错了什么?感谢
答案 0 :(得分:0)
您不会以与网站相同的方式发送凭据。
更改以下内容(以匹配其代码):
$postValues = array(
'user' => USERNAME,
'pass' => md5(PASSWORD),
'type' => 'loginreq', // You also forgot this one
);
您可能还想将CURLOPT_FOLLOWLOCATION
更改为true
,因为他们很可能会重定向成功登录。
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);