我正在尝试使用curl php脚本登录网站,但是由于viewstate generator和eventvalidation而不能。有什么办法可以绕过它?

时间:2019-06-07 02:45:58

标签: php curl viewstate eventvalidation

我正在尝试使用cUrl登录网站并从该网站抓取某些数据。这是一个家庭作业项目。但是该网站有3种不同的表单数据,每次我登录时都会更改。

是否可以绕过该程序并登录,或者只是不可能?如果是这样,有人可以帮助我朝正确的方向开始吗?

我尝试过的cURL代码是:

<?php
include("simple_html_dom.php");

$cofile = dirname(__FILE__).'/cookie.txt';
$postfield= array(

 "SM"=>"UpPnlLogin|btnLogin",

  "__LASTFOCUS"=>"",
  "__EVENTTARGET"=>"btnLogin",

  "__EVENTARGUMENT"=>"",

  "__VIEWSTATE"=>"hly8ipIDyvfEpBj01vjkB/HmrA
  yIw+UuyvBkGc5NHMexWF+PvAVQZYkSrcwJM4rO9aaz
  93ogQuFxowVMDPueJz5DU3obstDtyl7KuLvZXQ+GJ1
  JKRGEtTTRl5vM2RIi7mwL+j3LRqHgl+ZW1wftsnt2q
  nUy7rrxSC6j0eoqabUM/hpS1hveORvLcEbo+5o1J+r
  W0+UYYnZ/cFQcUNhx5538uRaD8PIxq6GxTrT/qI2ef
  DDLJB5qmmANILYPxsVg++dXFmQFD59MvETq+R3Om0g
  ==",

  "__VIEWSTATEGENERATOR"=>"CADA6983",

  "__EVENTVALIDATION"=>"y2iWoj4pBfE6Ij55U/Hf
  Sq/mWPNVk4Hv4Nvg7IDxuN6KElLeNsq4iUIbHMfGQS
  8s6oProuk3wXUrqQWG6VleouPj+M3LLkKYR8XhLzmw
  e4Cck3tqa/YpGmNLZiNOLkbN4/RhPFq+onAiQ2GDc4
  gHlU5aU94WwONQ9ItyzsH4V111bPhKX3gjr9YXhpPg
  9UiyWwkNXohLJSWRM9jGfHrgMg==",

  "txtCustNo"=>"username",

  "txtPassword"=>"password",

  "__ASYNCPOST"=>"true",

  "btnLogin"=>"Нэвтрэх"

  );

$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, $cofile);
curl_setopt($ch, CURLOPT_URL,"https://e.khanbank.com/");//url that is 
requested when logging in
curl_setopt($ch, 
CURLOPT_REFERER,"https://e.khanbank.com/");//CURLOPT_REFERER
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($postfield));


ob_start();      // prevent any output
curl_exec ($ch); // execute the curl command
ob_end_clean();  // stop preventing output

curl_close ($ch);
unset($ch);

$ch = curl_init();
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cofile);
curl_setopt($ch, CURLOPT_URL,"https://e.khanbank.com/pageMain? 
content=ucMain_Welcome");

$result = curl_exec ($ch);

curl_close ($ch);

echo $result;

?>

1 个答案:

答案 0 :(得分:1)

您无法对值进行硬编码,这些值在每次登录时都会更改,并且与Cookie会话相关联,这意味着您从浏览器获得的EVENTVALIDATION与浏览器的Cookie会话相关联,并且不适用于curl 。

我将用hhb_curl library

编写一个示例

首先将此功能添加到某个地方,您将需要它(它使DOMDocument加载带有utf-8字符集的HTML,这不是DOMDocument的默认值,但khanbank使用utf-8),

function my_dom_loader(string $html): \DOMDocument
{
    $html = trim($html);
    if (empty($html)) {
        //....
    }
    if (false === stripos($html, '<?xml encoding=')) {
        $html = '<?xml encoding="UTF-8">' . $html;
    }
    $ret = new DOMDocument('', 'UTF-8');
    $ret->preserveWhiteSpace = false;
    $ret->formatOutput = true;
    if (!(@$ret->loadHTML($html, LIBXML_NOBLANKS | LIBXML_NONET | LIBXML_BIGLINES))) {
        throw new \Exception("failed to create DOMDocument from input html!");
    }
    $ret->preserveWhiteSpace = false;
    $ret->formatOutput = true;
    return $ret;
}

首先创建hhb_curl句柄,

<?php
declare (strict_types = 1);
require_once('hhb_.inc.php');
$hc = new hhb_curl('', true);

现在,khanbank.com使用的是浏览器白名单,如果您未使用白名单的浏览器,则无法登录。白名单的浏览器示例是Google Chrome 75 X64,因此请通过设置< / p>

$hc->setopt(CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36');

下一步获取登录页面以获取cookie和EVENTVALIDATION内容,

$html = $hc->exec('https://e.khanbank.com/')->getStdOut();

现在我们在html中获得了EVENTVALIDATION内容,我们需要从html中解析出它,

$domd = my_dom_loader($html);
$xp = new DOMXPath($domd);
$form = $domd->getElementById("Form1");
$post_data = array();
foreach ($form->getElementsByTagName("input") as $input) {
    $post_data[$input->getAttribute("name")] = $input->getAttribute("value");
}
assert(isset($post_data['txtCustNo']), "ERROR: COULD NOT FIND USERNAME INPUT!");
assert(isset($post_data['txtPassword']), "ERROR: COULD NOT FIND PASSWORD INPUT!");

现在$post_data包含:

array (
  '__VIEWSTATE' => '9GT5O4HrKQJrWbF7PRSXu9RiMlpkqY5hO+sN9H0OXxmwYjWMfr2uf4yIgpHtk9sp56RWot30dvKeuGF3+eoOhpNu5nsuGBjtrpb8g8AGMaDbQ0nxpEKS3HILkqccMwFfn7y0LThLfjm0Ow84RGosJa+/5iM9YfP/HFM5HnyHKGJkM84nGEh7QZfoGYwMOU9SSb5dKmxfnmrIo/xXUUh4DT8+LOFGCQ2H5+nPFudTonwfgX6AKBNhkRijlfrUY+ns7HMq699AU38bsaxgD67KEw==',
  '__VIEWSTATEGENERATOR' => 'CADA6983',
  '__EVENTVALIDATION' => '4FZipDfTouUXBNMfIqlf/SXhPNyW5SBkcH/JIZB/j8kdaJUlMAQzvodpEq2n6WBRvxs6IBGVASOFouDQbqjygKK8+01KbRa9CpEGRiYGdxSIlt0wbZ2wJZeN6kB2ncn2DSd3C3nymCcz1kGHIdR3Dy5l2OlS6JngVCVoXuhpDzsjDQbrRwHST85XOlXdF6jl8/aQPYkSlZkSRQ5BFzdbnw==',
  'txtCustNo' => '',
  'txtPassword' => '',
  'chkRemUser' => '',
)

这些绑定到此特定的cookie会话,因此您每次都必须将它们解析出html,您不能对其进行硬编码,但是仍然缺少一些变量(因为它们是使用javascript设置的) ,而不是HTML),因此添加以下内容:

$post_data['SM'] = 'UpPnlLogin|btnLogin';
$post_data['__LASTFOCUS'] = '';
$post_data['__EVENTARGUMENT'] = '';
$post_data['__EVENTTARGET'] = 'btnLogin';
$post_data['__ASYNCPOST'] = 'true';

现在设置用户名和密码:

$post_data['txtCustNo'] = "username";
$post_data['txtPassword'] = "password";

最后发送实际的登录请求:

$html = $hc->setopt_array(array(
    CURLOPT_POST => 1,
    CURLOPT_POSTFIELDS => http_build_query($post_data),
    CURLOPT_URL => 'https://e.khanbank.com/'
))->exec()->getStdOut();

最后:检查登录错误:

$domd = my_dom_loader($html);
$xp = new DOMXPath($domd);
$login_errors = array();
//uk-alert uk-alert-warning

foreach ($xp->query("//*[contains(@class,'alert')]") as $login_error) {
    $login_error = trim($login_error->textContent);
    if (!empty($login_error)) {
        $login_errors[] = $login_error;
    }
}
if (!empty($login_errors)) {
    var_dump($login_errors);
    throw new \RuntimeException("login errors: " . json_encode($login_errors, JSON_PRETTY_PRINT));
}
echo "logged in successfully! :)";

产生:

$ php wtf4.php
array(1) {
  [0]=>
  string(69) "Нэвтрэх нэр эсвэл нууц үг буруу байна!"
}
PHP Fatal error:  Uncaught RuntimeException: login errors: [
    "\u041d\u044d\u0432\u0442\u0440\u044d\u0445 \u043d\u044d\u0440 \u044d\u0441\u0432\u044d\u043b \u043d\u0443\u0443\u0446 \u04af\u0433 \u0431\u0443\u0440\u0443\u0443 \u0431\u0430\u0439\u043d\u0430!"
] in /cygdrive/c/projects/misc/wtf4.php:63
Stack trace:
#0 {main}
  thrown in /cygdrive/c/projects/misc/wtf4.php on line 63
  • 因为“用户名”和“密码”不是有效的登录凭据。 \u0431\u0430\u0439\u043d\u0430也很奇怪,这是因为PHP的Exception消息似乎不支持Unicode字符,并且错误消息是用Unicode字符编写的(也许是俄语?)