Question

我正在开设一个图书馆页面，该图书馆将显示图书馆已添加到其馆藏中的最新书籍，电影和物品。

朋友和我（我们都是PHP新手）一直在尝试使用cURL来实现这一目标。我们已经获得了代码来获取我们想要的部分并将其格式化为the results page.

我们遇到的问题是我们输入cURL的网址是以某种方式自动生成的，并且每隔几个小时就会一直过期并打破页面。

以下是我们使用的PHP：

<?php    
//function storeLink($url,$gathered_from) {
//   $query = "INSERT INTO links (url, gathered_from) VALUES ('$url', '$gathered_from')";
//    mysql_query($query) or die('Error, insert query failed');
//}



// make the cURL request to $target_url
$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, "http://catalog.yourppl.org/limitedsearch.asp"); 
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$refreshlink= curl_exec($ch);


$endlink = strpos($refreshlink,'Hot New Items')-2;//end
$startlink = $endlink -249;
$startlink = strpos($refreshlink,'http',$startlink);//start
$endlink = $endlink - $startlink;
$linkurl =  substr("$refreshlink",$startlink, $endlink);
//echo $linkurl;

//this is the link that expires
$linkurl = "http://www.catalog.portsmouth.lib.oh.us/TLCScripts/interpac.dll?NewestSearch&Config=pac&FormId=0&LimitsId=-168&StartIndex=0&SearchField=119&Searchtype=1&SearchAvailableOnly=0&Branch=,0,&PeriodLimit=30&ItemsPerPage=10&SearchData=&autohide=true";


$useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1";

curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL, $linkurl); 
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
$html= curl_exec($ch);
if (!$html) {
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
exit;
}

$content = $html;

$PHolder  = 0;
$x = 0;
$y = 0;
$max = strlen($content);
$isbn =  array(300=>0);
$stitle =  array(300=>0);
$sbookcover =  array(300=>0);


while ($x < 200 )
{
$x++;


$start = strpos($content,'isbn',$PHolder+5);//beginning

$start2 = strpos($content,'Branch=,0,"',$start+5);//beginning

$start2 = $start2 -400;

if ($start2 < 0)break;
$start2 = strpos($content,'<a href',$start2);
if ($start2 == "")break;



$start2 = $start2 - 12;

$end2 = strpos($content,'</a>',$start);


$end = strpos($content,'"',$start);
$offset = 13;
$offset2 = $end2 - $start2;

if (substr("$content", $start+5, $offset) != $isbn)
{

if(array_search(substr("$content", $start+5, $offset), $isbn) == 0 )
{
    $y++;
    $isbn[$y] =  substr("$content", $start+5, $offset);

    $sbookcover[$y]="
        <img border=\"0\" width = \"170\" alt=\"Book Jacket\"src=\"http://ls2content.tlcdelivers.com/content.html?customerid=7977&amp;requesttype=bookjacket-lg&amp;isbn=$isbn[$y]&amp;isbn=$isbn[$y]\">
        ";


    $stitle[$y]=   substr("$content", $start2+12, $offset2);

    $bookcover = $sbookcover[$y];

    $title = $stitle[$y]."</a>";
    $stitle[$y] = str_replace("<a href=\"","<a href=\"http://catalog.yourppl.org",$title);

    $stitle[$y] = str_replace("\">","\" rel=\"shadowbox\">",$stitle[$y]);

    $booklinkend = strpos($stitle[$y],"\">");
    $booklink = substr($stitle[$y], 0, $booklinkend+2);


   $sbookcover[$y] = "$booklink".$sbookcover[$y]."</a>";

}

}


$PHolder = $start;


}  



echo"

<table class=\"twocolorformat\" width=\"95%\">



";

$xx = 1;
while ($xy <= 6)
{
$xy++;

echo "

<tr>
<td width=\"33%\" align=\"center\"><div class=\"bookcover\">$sbookcover[$xx]</div></td>
";
$xx++;
echo"
<td width=\"33%\" align=\"center\"><div class=\"bookcover\">$sbookcover[$xx]</td>
";
$xx++;
echo"
<td width=\"33%\" align=\"center\"><div class=\"bookcover\">$sbookcover[$xx]</td>
";
$xx = $xx -2;

echo"
</tr>
<tr>
<td width=\"33%\">$stitle[$xx]</td>
";
$xx++;
echo"
<td width=\"33%\">$stitle[$xx]</td>
";
$xx++;
echo"
<td width=\"33%\">$stitle[$xx]</td>
";
$xx = $xx -2;
echo"
</tr>

";//this is the table row and table data definition. covers and titles are fed to table here.



$xx = $xx +3;
if ($sbookcover[$xx] == "")break;
}


echo"

</table>

";//close your table here


?>

包含链接的页面位于：

http://www.catalog.portsmouth.lib.oh.us/limitedsearch.asp

我们希望从该页面上的“热门新项目”中获取图书并覆盖图像，并在我们开始工作后继续处理其余部分。

如果单击Hot New Items链接，则初始URL为：

http://www.catalog.portsmouth.lib.oh.us/TLCScripts/interpac.dll?Limits&LimitsId=0&FormId=0&StartIndex=0&Config=pac&ReturnForm=22&Branch=,0,&periodlimit=30&LimitCollection=1&Collection=Adult%20New%20Book&autosubmit=true

但加载页面后，请更改为：

http://www.catalog.portsmouth.lib.oh.us/TLCScripts/interpac.dll?NewestSearch&Config=pac&FormId=0&LimitsId=-178&StartIndex=0&SearchField=119&Searchtype=1&SearchAvailableOnly=0&Branch=,0,&PeriodLimit=30&ItemsPerPage=10&SearchData=&autohide=true

我们可以做些什么来绕过即将到期的链接？如果需要，我可以提供更多代码和解释。

非常感谢能够提供帮助的任何人，特里

Answer 1

我们可以做些什么来绕过过期的链接？

您正在与一个系统接口，该系统的设计并非按照您的方式使用（ab）。像许多搜索系统一样，看起来他们正在构建结果并将它们存储在某个地方。与许多搜索系统一样，这些结果在一段时间后也会失效。

您必须在假设搜索结果快速进入以太网非常的情况下设计您的代码。

看起来URL中有一个参数指示每页有多少结果。尝试将其更改为更高的数字 - 多更高的数字。它们似乎没有在代码级别对其进行边界检查。我没有抱怨就能输入1000，尽管它只返回了341个链接。

请记住，这很可能会导致他们的计算机上出现一些非常明显的负载，并且在提出请求时应该小心谨慎。你不想通过让它看起来像是在攻击他们的服务来引起你的注意。

Answer 2

从原始链接返回的页面生成结果，然后向您发送一个页面，该页面使用将值插入URL的javascript，然后将您发送到该URL以获取存储的结果页面。结果页面由服务器使用LimitsID标识（您可以在结果页面的URL中看到它）。他们必须使用此数字来控制页面持续多长时间，并且每个请求都会生成新的LimitsID，因为并非每个ID都适用于此结果页面。所有这一点，您可以使用cURL获取第一页（原始页面的链接，它将生成结果并将它们存储在服务器上），在响应页面中搜索文本'LimitsId = - ' （由于某些原因，他们都在他们面前有一个短划线，但我不确定他们是否应该在数字上升时为负数）并将该文本粘贴到您正在使用的URL中的同一行之后你的脚本，它将带你到新生成的结果。

然而，正如Charles所指出的，这些请求会给服务器带来很大的负担，所以也许你可以在旧请求到期时生成一个新的请求。

遇到cURL和链接过期问题

2 个答案: