我正在尝试从here获取数据,为此我使用了以下代码。但它正在给我们在浏览器中找到的不同结果。我不知道为什么会这样。请帮我。此外,日志文件和cookie文件中没有内容。 我的代码:
<?php
function curl($url ,$binary=false,$post=false,$cookie =false ){
touch($cookie);
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url );
curl_setopt ($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
if($cookie){
$agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)";
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
}
if($binary)
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
if($post){
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
}
return curl_exec ($ch);
echo curl_getinfo($ch, CURLINFO_HEADER_OUT);
}
$dist=01;
$assem=98;
$ok="Proceed";
$url="http://164.100.153.3/e-registration/booth_entry_report.aspx";
$cookie="cookie.txt";
$f = fopen('log.txt', 'w');
touch($cookie);
$useragent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/36.0.1985.125 Chrome/36.0.1985.125 Safari/537.36';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
$html = curl_exec($ch);
curl_close($ch);
preg_match('~<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="(.*?)" />~', $html, $viewstate);
preg_match('~<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="(.*?)" />~', $html, $eventValidation);
$viewstate = $viewstate[1];
$eventValidation = $eventValidation[1];
$postdata = "ddldistrict=".$dist."&ddlassembly=".$assem."&btnproceed=".$ok."&VIEWSTATE=".$viewstate."&EVENTVALIDATION".$eventValidation;
// function
$ch = curl($url,false,$postdata,$cookie);
//$url ='http://164.100.153.3/e-registration/booth_level_officer_report.aspx';
//$cookie="cookie.txt";
//$ch =curl($url,false,false,$cookie);
echo $ch;
?>
浏览器的真实结果: 卷曲回归的不同结果:
答案 0 :(得分:0)
您正在抓取的页面需要更多POST参数(Firebug告诉我)。该表单也发布了“__VIEWSTATE”和“__EVENTVALIDATION”(还有一些,所以也请检查一下)。
此外,表单被提交到同一页面“booth_entry_report.aspx”,然后它被REDIRECTED(代码302)到“booth_level_officer_report.aspx”页面。所以你不需要第二个curl()因为第一个已经有了CURLOPT_FOLLOWLOCATION。
此外,发送的标头可能存在一些问题。我建议您使用Firebug或更好的Fiddler来查看浏览器发送的请求,并将其与php curl发送的信息进行比较。
要查看php curl发送的标头,请调用curl_setopt($ ch,CURLINFO_HEADER_OUT,true),然后在curl_exec()之后回显curl_getinfo($ ch,CURLINFO_HEADER_OUT)。
<强>更新强>
确保'__'在那些post params之前。
将ddldistrict和ddlassembly设置为字符串('01'和'98')
我建议您通过该页面执行的所有页面加载:
更新2代码:
以下是我上面所说的代码。但是我恐怕无法帮助你。该页面看起来确实很棘手。祝你好运!
$dist='01';
$assem='98';
$ok="Proceed";
$url="http://164.100.153.3/e-registration/booth_entry_report.aspx";
$cookie="cookie.txt";
/////// get the first page
$ch = curl_init($url);
// here curl_setopt() for url, cookie, useragent, followlocation, etc
$html = curl_exec($ch);
curl_close($ch);
// get those variables with preg_match
preg_match('... __VIEWSTATE ...', $html, $viewstate);
$viewstate = $viewstate[1];
preg_match('... __EVENTVALIDATION ...', $html, $eventValidation);
$eventValidation = $eventValidation[1];
// do the preg_match for '__EVENTTARGET', '__EVENTARGUMENT', '__LASTFOCUS'
//////// do the next request
// the first post data with 'dllassembly=0' and without 'btnprocess'. Like when you selected the "District name" in the browser.
$postdata = "ddldistrict=".$dist."&ddlassembly=0&__VIEWSTATE=".$viewstate."&__EVENTVALIDATION".$eventValidation // add the rest of the __ fields too
// post it with referer
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_REFERER, $url); // this with referer
// here curl_setopt() for post data, url, cookie, useragent, followlocation, etc
$html = curl_exec($ch);
curl_close($ch);
// here get all the '__' fields again with preg_match(), just like last time
preg_match('... __VIEWSTATE ...', $html, $viewstate);
$viewstate = $viewstate[1];
preg_match('... __EVENTVALIDATION ...', $html, $eventValidation);
$eventValidation = $eventValidation[1];
// do the preg_match for '__EVENTTARGET', '__EVENTARGUMENT', '__LASTFOCUS'
//////// do the last request
// the second post data with 'dllassembly=98' and also 'btnprocess'. Like when you clicked "Procees" in the browser.
$postdata = "ddldistrict=".$dist."&ddlassembly=".$assem."&btnproceed=".$ok."&__VIEWSTATE=".$viewstate."&__EVENTVALIDATION".$eventValidation // add the rest of the __ fields too
// and finally post it with referer
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_REFERER, $url); // this with referer
// here curl_setopt() for post data, url, cookie, useragent, followlocation, etc
$html = curl_exec($ch);
curl_close($ch);
关于邮政参数创建的注意事项。更好的方法是创建一个关联数组,并使用http_build_query()。喜欢这个
$post_data = array(
'ddldistrict' => '01',
'__EVENTVALIDATION' => $eventValidation,
'....' => '....'
);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($post_data));