Question

使用PHP Simple HTML DOM Parser（http://simplehtmldom.sourceforge.net），我最近遇到的情况是我经常提取的外部网页没有响应（他们的服务器已关闭）。因此，我自己的网站不会加载（相反，它会在漫长的等待期后显示错误）。

在不合适的提取尝试时，为此解析器添加故障保护的最佳方法是什么？

我试图使用以下内容但没有成功。

include('./inc/simple_html_dom.php');  

$html = file_get_html('http://client0.example.com/dcnum.php?count=1');
$str = $html->find('body',0);
$num = $str->innertext;

if(!$html)
{
 error('No response.')
}

$html->clear(); 
unset($html);

编辑：我还没有时间尝试这个，但也许我可以在第一行之后（在$ html-＆gt; find（'body'，0）部分之前）直接放置我的'if'语句。

Answer 1

如果我了解您希望防止他们离线时离线...

如果您使用的是PHP的curl绑定，可以使用curl_getinfo检查错误代码：

$handle = curl_init($url);
curl_setopt($handle,  CURLOPT_RETURNTRANSFER, TRUE);

/* Get the HTML or whatever is linked in $url. */
$response = curl_exec($handle);

/* Check for 404 (file not found). */
$httpCode = curl_getinfo($handle, CURLINFO_HTTP_CODE);
if($httpCode == 404) {
    /* Handle 404 here. */
}

curl_close($handle);

/* Handle $response here. */

您还可以检查其他错误代码，如500,503等

Answer 2

花了几个小时来弄清楚这一点，令人惊讶的是，很少有关于如何使用simple_html_dom处理错误的线索。

基本上你所要做的就是摆脱file_get_html，->load_file或者你用来加载内容的simple_html_dom特定方法，而是用curl做，并将其传递给{ {1}}。

我使用了另一个答案的代码，以下是如何使用它：

str_get_html

如果它在大型网站上更加稳定。

如果是您正在寻找的那种行为，请尝试一下，并告诉我，我没有想过你的一天。

PHP简单HTML DOM解析器的故障保护

2 个答案: