Question

使用PHP我正在尝试抓取网站页面，然后自动抓取图像。

我尝试了以下内容：

<?php
$url = "http://www.domain.co.uk/news/local-news";

$str = file_get_contents($url);
?>

和

<?php
    $opts = array('http'=>array('header' => "User-Agent:Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.75 Safari/537.1\r\n"));
    $context = stream_context_create($opts);
    $header = file_get_contents('http://www.domain.co.uk/news/local-news',false,$context);
?>

以及

<?php
include('simple_html_dom.php');

$html = file_get_html('http://www.domain.co.uk/news/local-news');

$result = $html->find('section article img', 0)->outertext;
?>

但这些都以Internal Server Error返回。我可以在浏览器中完美地查看该网站，但是当我尝试在PHP中抓取该页面时，它失败了。

我有什么可以尝试的吗？

Answer 1

尝试以下代码：它会将内容保存在本地文件中。

<?php
$ch = curl_init("http://www.domain.co.uk/news/local-news");
$fp = fopen("localfile.html", "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
?>

现在你可以准备localfile.html。

Answer 2

有时，您可能会在使用file_get_contents打开http网址时出错。即使您已在 php.ini

中设置了allow_url_fopen = On

对我来说，解决方案是设置＆＃34; user_agent＆＃34;对某事。

PHP 500内部服务器错误file_get_contents

2 个答案: