我正在尝试从名为“CM / ECF”(Pacer)的东西请求受密码保护的页面,以便使用PHP / cURL查看法庭文件。
我正在使用名为Tamper Data的FireFox扩展,它允许我查看标题和POST数据,然后尝试使用cURL复制该请求PHP。
由于某些原因,它不能正常工作,我一直收到登录请求。我可以正常登录,将cookie保存到cookie jar并获取“Main”页面,但是当我再次进行卷曲调用时(发送相同的cookie)到主机将我重定向到登录页面的搜索页面。
两部分问题: 第1部分 - 当我使用TaperData查看请求页面时发送的cookie时,TamperData向我显示:
PacerUser="xxxxxxxxxxx xxxxxxx";
PacerSession="xxxxxSW8+F/BCzRxxxxxxhYtWpfO4ZR8WTEYbnaeeoVixAp5YnKMWxxxxxx0U8MoEPt2FOxxxxxxx/5B9ujb";
PacerPref="receipt=Y";
PacerClientCode="";
__utma=20643455934534311.139983455.139934505.13998383455.1;
__utmb=206345345.10.13453405;
__utmc=2053453433351;
__utmz=20653453351.1399345345.1.utmcsr=pacer.gov|utmccn=(referral)|utmcmd=referral|utmcct=/cmecf/developer/
但是libcurl生成的cookie文件不包含任何以下划线开头的行。那是什么?
这是我的浏览器发出的请求,从TamperData复制:
Host=ecf.almb.uscourts.gov
User-Agent=Mozilla/5.0 (Windows NT 6.3; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0
Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language=en-US,en;q=0.5
Accept-Encoding=gzip, deflate
DNT=1
Cookie=PacerUser="wmasdfasdf ZFBgasdfasdfsdff PacerSession="7rkPasdfasdfasdfasdfasdfsdadfnaeeoVixAp5YnKMW9lokKeq4ss4m0U8MoEPt2FOj2P/51RLh/5B9ujb"; PacerPref="receipt=Y"; PacerClientCode=""; __utma=203145253483351.15234521.13998234523405.139234505.139982345305.1; __utmc=2034533351; __utmz=206453453351.14538105.1.1.utmcsr=pacer.gov|utmccn=(referral)|utmcmd=referral|utmcct=/cmecf/developer/
Connection=keep-alive
Cache-Control=max-age=0
这是我的PHP
$Headers = array(
"Host: ".$this->CaseFiled_endpoints[$district],
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language: en-US,en;q=0.5",
"Accept-Encoding: gzip, deflate",
"Connection: keep-alive"
);
$url = "https://".$this->CaseFiled_endpoints[$district]."/cgi-bin/CaseFiled-Rpt.pl";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows NT 6.3; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0');
curl_setopt($ch, CURLOPT_HTTPHEADER, $Headers);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, realpath($this->cookiefile));
curl_setopt($ch, CURLOPT_COOKIEFILE, realpath($this->cookiefile));
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$answer2 = curl_exec($ch);
return curl_getinfo($ch);
我的代码有什么明显错误吗?还有其他工具可以让这更容易吗?一个浏览器插件吐出卷曲代码?
答案 0 :(得分:4)
在Chromes网络标签中,您可以找到“复制为cURL”功能。它将是剪贴板的命令行,它将使用cURL复制该请求。从那以后,将它转换为PHP代码应该是微不足道的。
答案 1 :(得分:1)
这里是您缺少的神奇汤,curl_setopt中的$ cookie文件。
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $cookie);
然后你会把帖子卷到登录表单,保存cookie文件,然后检查cookie上的文件时间(看看它是否过时)并创建新cookie或在随后发送$ cookie文件请求。
注意我没有这条线
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
还请注意http://curl.haxx.se/libcurl/c/CURLOPT_COOKIESESSION.html
将长集传递给1以将其标记为新的cookie“会话”。它会 强制libcurl忽略它将要加载的所有cookie 上一届会议的“会议cookie”。默认情况下,libcurl 始终存储和加载所有cookie,如果它们是会话则独立 饼干与否。会话cookie是没有到期日期的cookie 他们只是为了这个“会议”而活着并存在。
我想你每次都要开始一个新的会议。
P.S。 - 我也使用起搏器。
public function Login(){
$cookie_file = __DIR__."/cookie.txt";
$cookie_file = str_replace("\\", "/", $cookie_file);
$this->_cookie_file = $cookie_file;
$new_file = false;
if(!is_file($cookie_file)){
$h = fopen($cookie_file, "w");
fclose($h);
$file_time = time();
$new_file = true;
}else{
$file_time = filemtime($cookie_file);
}
//login
if($file_time < (time() - 1800) || $new_file){
$url = "https://pacer.login.uscourts.gov/cgi-bin/check-pacer-passwd.pl";
$post = array(
"loginid"=>"loginID",
"passwd"=>"password",
"client"=> "client",
"faction"=>"Login",
"appurl"=>"https://pcl.uscourts.gov/search"
);
$res = $this->_cUrl->cPost($url, $post, $cookie_file);
$this->Log("LOGGING IN AT ".date("Y-m-d H:i:s"));
sleep(2);
$this->Log("SLEEPING 2 ..",E_USER_DEPRECATED);
}
}
来自我的curl库类。
public function cPost($url, $post, $cookie_file="cookie.txt"){
if(is_array($post)){
$post_string = $this->encodePost($post);
}else{
$post_string = $post;
}
$cookie = str_replace("\\", "/", $cookie_file);
$fc = fopen($cookie, "r");
fclose($fc);
$ch = curl_init();
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_STDERR, $this->_error_handle);
fwrite($this->_error_handle,"Starting debug file ".date('Y-m-d H:i:s')."\n");
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt ($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1");
curl_setopt ($ch, CURLOPT_TIMEOUT, 60);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt ($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt ($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLINFO_HEADER_OUT, true); // enable tracking
curl_setopt ($ch, CURLOPT_POSTFIELDS, $post_string);
curl_setopt ($ch, CURLOPT_POST, 1);
$result = curl_exec ($ch);
if ( curl_errno($ch) ) {
$response = 'ERROR -> ' . curl_errno($ch) . ': ' . curl_error($ch);
throw new CurlException($response);
} else {
$returnCode = (int)curl_getinfo($ch, CURLINFO_HTTP_CODE);
switch($returnCode){
case 404:
$response = 'ERROR -> 404 Not Found';
throw new CurlException($response, CurlException::ER_RETURN_CODE);
break;
default:
break;
}
}
curl_close($ch);
return $result;
}
访问搜索表单。
$url = "https://pcl.uscourts.gov/dquery";
$post = array(
"case_no"=>$case_no,
"mdl_id"=>"",
"stitle"=>"",
"nos"=> array(
"370",
"371",
"440",
"470",
"480",
"890"
),
"date_filed_start"=>$date_filed_start,
"date_filed_end"=>$date_filed_end,
"date_term_start"=>"",
"date_term_end"=>"",
"date_dismiss_start"=>"",
"date_dismiss_end"=>"",
"date_discharge_start"=>"",
"date_discharge_end"=>"",
"party"=>$party,
"ssn4"=>"",
"ssn"=>"",
"court_type"=>"cv",
"default_form"=>"cvb"
);
print_r($post);
$html = $this->_cUrl->cPost($url, $post, $this->_cookie_file);
我在生产环境中使用这段代码已有一年多了 - 这是王国的关键lol。