Question

我有一个脚本，在登录到另一个站点后下载PDF文件。到目前为止，它对所有网站都很有用，但我现在对于我正在抓取的新网站有些奇怪：下载的一些文件是1kb（即它不起作用），而其他工作正常。使用浏览器中的下载链接打开“是否要保存此文件”窗口，文件在那里是正确的。

这是我的代码（我包括整个scrape中使用的一般curl参数，以及我尝试下载文件的最后部分）：

//Initial connection to login page
$header[] = 'Host: www.domain.com';
$header[] = 'Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8';
$header[] = 'Accept-Language: en-US,en;q=0.5';
$header[] = 'Connection: keep-alive';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.domain.com/login');
curl_setopt($ch, CURLOPT_REFERER, 'https://www.domain.com');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0');
curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookieLocation);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookieLocation);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
$webpage = curl_exec($ch);

//Then several operations to login, grab the list of links to PDF download files (...)

//Loop through the array containing the url of the file to download and save it to a folder (writable)
curl_setopt($ch, CURLOPT_POST, false);
foreach($foundBills as $key => $bill)
{
    curl_setopt($ch, CURLOPT_URL, $bill['url']);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    $pdfFile = curl_exec($ch);
    $randomFileName = rand_string(20); //generates a 20 char long random string
    $newPDF = $userBillsRoot.$randomFileName.'.pdf';
    write_file($newPDF, $pdfFile, 'wb'); //using a Codeigniter function to save the file
}

每个文件不到1mb。有任何想法吗？如何查看有关其无效的原因的详细信息（例如超时）？谢谢！

使用PHP curl下载一些文件的问题

0 个答案: