Question

我使用curl收到一个html字符串：

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$html_string = curl_exec($ch);

当我echo时，我看到了一个非常好的html，因为我需要满足我的解析需求。但是，当尝试将此字符串发送到HTML DOM PARSER方法str_get_html($html_string)时，它不会上传它（从方法调用返回false）。

我尝试将其保存到文件并在文件上打开file_get_html，但同样的事情发生了。

这可能是什么原因？正如我所说，当我回应它时，html看起来非常好。

非常感谢。

代码本身：

$html = file_get_html("http://www.bgu.co.il/tremp.aspx");
$v = $html->find('input[id=__VIEWSTATE]');
$viewState = $v[0]->attr['value'];
$e = $html->find('input=[id=__EVENTVALIDATION]');
$event = $e[0]->attr['value'];

$html->clear(); 
unset($html);

$body = " A_STRING_THAT_CONTAINS_SOME_DATA " 

$ch = curl_init("http://www.bgu.co.il/tremp.aspx");
curl_setopt($ch, CURLOPT_POSTFIELDS, $body);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

$html_string = curl_exec($ch);

$file_handle = fopen("file.txt", "w");
fwrite($file_handle, $html_string);
fclose($file_handle);

curl_close($ch);

$html = str_get_html($html_string);

Answer 1

你的卷曲链接似乎有很多元素（大文件）。

我正在解析与链接一样大的字符串（文件）并遇到此问题。

在看到源代码后，我发现了问题。它对我有用！

我发现simple_html_dom.php限制了你读的大小。

// get html dom from string
  function str_get_html($str, $lowercase=true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_B     R_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)
  {
           $dom = new simple_html_dom(null, $lowercase, $forceTagsClosed, $target_charset, $stripRN, $defaultBRText, $defaultSpanText);
           if (empty($str) || strlen($str) > MAX_FILE_SIZE)
           {
                   $dom->clear();
                   return false;
           }
           $dom->load($str, $lowercase, $stripRN);
           return $dom;
  }

你必须改变下面的默认大小（它位于simple_html_dom.php的顶部）
也许变成1亿？这取决于你。

define('MAX_FILE_SIZE', 6000000);

Answer 2

您是否检查HTML是否以HTML DOM PARSER不期望的方式进行编码？例如。使用<html>而不是<html>等HTML实体 - 仍会在浏览器中显示为正确的HTML但不会解析。

Answer 3

我认为您使用的是curl + str_get_html而不是简单地使用带有URL的file_get_html，因为您需要发送POST参数。

您可以使用此W3C验证程序（http://validator.w3.org/#validate_by_input+with_options）验证返回的HTML，然后，一旦您确定结果是100％有效的HTML代码，就可以在此处报告错误：http://sourceforge.net/p/simplehtmldom/bugs/。

str_get_html没有加载有效的html字符串

3 个答案: