如何使用PHP cURL eventvalidation抓取ASP网站?

时间:2018-10-01 18:36:02

标签: php asp.net curl web-applications screen-scraping

我想为我的学校构建一个Web应用程序,我想在以下网站上填写表格:https://bonhoeffer.cupweb6.nl/(S(b3x5qrnwhrsuzih0zew3yycl))/default.aspx,然后检索结果(以后再使用)。

我发现用PHP最好的方法是使用cURL。但是,当我运行以下脚本时,将返回相同的登录页面,而不是结果。

    $url = "https://bonhoeffer.cupweb6.nl/(S(b3x5qrnwhrsuzih0zew3yycl))/default.aspx";
$ckfile = tempnam("/tmp", "CURLCOOKIE");
$useragent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2';

$username = "Groot";


$f = fopen('log.txt', 'w'); // file to write request header for debug purpose

/**
    Get __VIEWSTATE & __EVENTVALIDATION
 */
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);

$html = curl_exec($ch);

curl_close($ch);

preg_match('~<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="(.*?)" />~', $html, $viewstate);
preg_match('~<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="(.*?)" />~', $html, $eventValidation);

$viewstate = $viewstate[1];
$eventValidation = $eventValidation[1];



/**
 Start Login process
 */
$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, $ckfile);
curl_setopt($ch, CURLOPT_COOKIEFILE, $ckfile);
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_STDERR, $f);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);

// Collecting all POST fields
$postfields = array();
$postfields['__EVENTTARGET'] = "";
$postfields['__EVENTARGUMENT'] = "";
$postfields['__VIEWSTATE'] = $viewstate;
$postfields['__EVENTVALIDATION'] = $eventValidation;
$postfields['_nameTextBox'] = 'Groot';

curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
$ret = curl_exec($ch); // Get result after login page.

print $ret;

我知道这与VIEWSTATE和EVENTVALIDATION有关,但是我尝试的所有操作似乎均不起作用(我从stackoverflow中获得了该脚本的一部分),所有其他解决方案似乎都已过时。

我的问题

如何使此脚本工作并返回登录页面而不是登录表单的结果。 或者有更好的方法吗?

谢谢

0 个答案:

没有答案