Question

我一直在尝试用PHP编写一个简单的脚本来从ISBN数据库站点中提取数据。出于某种原因，我除了使用file_get_contents命令之外什么都没有。我现在设法为此工作了，但是我想知道是否有人知道为什么这不起作用？

以下内容不会在$ page中填充任何信息，因此下面的preg匹配无法获取任何信息。如果有人知道到底是什么阻止这将是伟大的？

$links = array ('
    http://www.isbndb.com/book/2009_cfa_exam_level_2_schweser_practice_exams_volume_2','
    http://www.isbndb.com/book/uniform_investment_adviser_law_exam_series_65','
    http://www.isbndb.com/book/waterworks_a02','
    http://www.isbndb.com/book/winning_the_toughest_customer_the_essential_guide_to_selling','
    http://www.isbndb.com/book/yale_daily_news_guide_to_fellowships_and_grants'

    ); // array of URLs

foreach ($links as $link)
{

    $page = file_get_contents($link);
    #print $page;

                preg_match("@<h1 itemprop='name'>(.*?)</h1>@is",$page,$title);
                preg_match("@<a itemprop='publisher' href='http://isbndb.com/publisher/(.*?)'>(.*?)</a>@is",$page,$publisher);
                preg_match("@<span>ISBN10: <span itemprop='isbn'>(.*?)</span>@is",$page,$isbn10);
                preg_match("@<span>ISBN13: <span itemprop='isbn'>(.*?)</span>@is",$page,$isbn13);
                        echo '<tr>
                        <td>'.$title[1].'</td>
                        <td>'.$publisher[2].'</td>
                        <td>'.$isbn10[1].'</td>
                        <td>'.$isbn13[1].'</td>
                        </tr>'; 
                        #exit();                                    

            }

Answer 1

我的猜测是你有错误的（不是直接的）网址。正确的应该没有www.部分 - 如果你触发其中任何一个并检查返回的标题，你会看到你被重定向（HTTP 301）到另一个URL。

在我看来，最好的方法是在cURL中使用curl_setopt选项CURLOPT_FOLLOWLOCATION和CURLOPT_MAXREDIRS。

当然你应该修剪你的网址，然后才能确定这不是问题。

此处示例：

$curl = curl_init();
foreach ($links as $link) {

   curl_setopt($curl, CURLOPT_URL, $link);
   curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
   curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
   curl_setopt($curl, CURLOPT_MAXREDIRS, 5); // max 5 redirects

   $result = curl_exec($curl);
   if (! $result) {
      continue; // if $result is empty or false - ignore and continue;
   }

   // do what you need to do here
}
curl_close($curl);

PHP file_get_contents错误，不会从数组中填充？

1 个答案: