我使用的网站会在您访问时存储两个Cookie(ASP.NET_SessionID
和__RequestVerificationToken_XXXXXXXXX
)。
该页面包含一个div,其中包含指向pdf的链接以及一个带有“pdf viewer”源的iframe。
我正在尝试使用cURL检索这两个cookie然后下载pdf。我发现我必须在cURL中设置几个选项。但是,我仍然无法下载pdf。
我现在的设置是:
ASP.NET_SessionID
Cookie,(b)从iframe中找到“pdf viewer”网址,(c)找到pdf下载网址__RequestVerificationToken_XXXXXXXXX
Cookie 但是,我的文件结果只是一个登录页面。
首先是cURL:
$agent= 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0';
$report_url = "[my_main_url_here]";
$ch1 = curl_init($report_url);
curl_setopt($ch1, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch1, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch1, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch1, CURLOPT_HEADER, true);
curl_setopt($ch1, CURLOPT_SSLVERSION, 4);
curl_setopt($ch1, CURLOPT_USERAGENT, $agent);
curl_setopt($ch1, CURLOPT_SSL_CIPHER_LIST, 'AES128-SHA:RC2-CBC-MD5');
curl_setopt($ch1, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch1, CURLOPT_HEADER, 1);
curl_setopt($ch1, CURLOPT_VERBOSE, true);
curl_setopt($ch1, CURLOPT_NOBODY, false);
$output1 = curl_exec($ch1);
curl_close($ch1);
我使用preg_match
查找pdf下载链接:
preg_match("/\/ReportID=.{30}/", $output1, $pdf_link);
$pdf_viewer_full = "https://gate.aon.com" . $pdf_link[0];
然后我点击pdf查看器URL获取第二个cookie:
$ch2 = curl_init($viewer_url_full);
curl_setopt($ch2, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch2, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch2, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch2, CURLOPT_HEADER, true);
curl_setopt($ch2, CURLOPT_SSLVERSION, 4);
curl_setopt($ch2, CURLOPT_USERAGENT, $agent);
curl_setopt($ch2, CURLOPT_SSL_CIPHER_LIST, 'AES128-SHA:RC2-CBC-MD5');
curl_setopt($ch2, CURLOPT_HEADER, 1);
curl_setopt($ch2, CURLOPT_VERBOSE, true);
curl_setopt($ch2, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch2, CURLOPT_NOBODY, false);
$output2 = curl_exec($ch2);
curl_close($ch2);
然后我从这两个标题中删除了Cookie:
preg_match("/ASP.NET_SessionId=......................../", $output1, $cookie1);
preg_match("/__RequestVerificationToken_.{145}/", $output2, $cookie2);
$cookies = 'Cookie: ' . $cookie1[0] . '; ' . $cookie2[0];
然后尝试下载文件:
$headers = array ($cookies);
$file = fopen ('Report.pdf', 'w+');
$ch3 = curl_init($pdf_link_full);
curl_setopt($ch3, CURLOPT_SSL_CIPHER_LIST, 'AES128-SHA:RC2-CBC-MD5');
curl_setopt($ch3, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch3, CURLOPT_FILE, $file);
curl_setopt($ch3, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch3, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch3, CURLOPT_SSLVERSION, 4);
curl_setopt($ch3, CURLOPT_USERAGENT, $agent);
curl_setopt($ch3, CURLOPT_COOKIEFILE, "cookie.txt");
$output3 = curl_exec($ch3);
curl_close($ch3);
编辑:如果我手动设置$pdf_link_full
,则可以。但是,如果我发现preg_match
(如上所述),则会失败。
但是,如果我打印$pdf_link_full
和$pdf_link_full_2
,它们就会显得相同。我在这里缺少编码或其他东西吗?谢谢!
答案 0 :(得分:0)
问题出在我的preg_match
上。它返回了一个&
的网址,当我手动设置它时,我只使用&符号(&
)。
用&
替换&
解决了问题。