Question

我不知道解决方案可能是什么。我只是无法获取此Charizard的html文件，即使链接正确也没有任何响应。 Bulbasaur工作正常，但我想要这个可爱的Charizard ...

include("simple_html_dom.php");
$html = file_get_html('https://bulbapedia.bulbagarden.net/wiki/Charizard_(Pok%C3%A9mon)');
$html2 = file_get_html('https://bulbapedia.bulbagarden.net/wiki/Bulbasaur_(Pok%C3%A9mon)');
echo $html;
echo $html2;

此页面有任何保护，还是仅难抓住Charizard？如果您能帮助我，我将不胜感激。

乔纳斯:)

Answer 1

我建议使用alternative library，因为II认为您不会通过simple_html_dom获得此信息：

include 'advanced_html_dom.php';
$html = file_get_html('https://bulbapedia.bulbagarden.net/wiki/Charizard_(Pok%C3%A9mon)');

echo $html->find('h1', 0)->text() . PHP_EOL;
echo $html->find('big a[title*="Pokédex number"]', 0)->text() . PHP_EOL;

这给出了：

Charizard (Pokémon)
#006

Answer 2

这里有两个问题：

从该URL提取的内容长度超过MAX_FILE_SIZE（在simple_html_dom.php中定义）
注释（https://github.com/sunra/php-simple-html-dom-parser/issues/37）中指出的错误。这个错误似乎可以在github上维护的分支存储库中解决，但仍存在于original version中（似乎不再维护了）。

要解决第一个问题，请编辑simple_html_dom.php并更改define('MAX_FILE_SIZE', 600000);以使用更大的数字。

作为第二个问题的解决方法，将正确的参数传递给file_get_html，也就是说，我的意思是为0传递$offset：

$html = file_get_html('https://bulbapedia.bulbagarden.net/wiki/Charizard_(Pok%C3%A9mon)',
false,
null,
0); // this last one is the offset

var_dump($html);

或者，您也可以使用forked version of the library。

Answer 3

由于我没有在php文档中找到file_get_html()，所以也许您更喜欢使用file_get_contents(url)。

简单的HTML DOM无法获取文件

3 个答案: