Question

我尝试使用simple_html_dom从网页中检索信息，如下所示：

<?PHP
include_once('dom/simple_html_dom.php');
$urlpart="http://w2.brreg.no/motorvogn/";
$url = "http://w2.brreg.no/motorvogn/heftelser_motorvogn.jsp?regnr=BR15597";
$html = file_get_html($url);

foreach($html->find('a') as $element) 
       if(preg_match('*dagb*',$element)) {
       $result=$urlpart.$element->href;

       $resultcontent=file_get_contents($result);
       echo $resultcontent;

       }

?>

$ result变量首先给我这个URL： http://w2.brreg.no/motorvogn/dagbokutskrift.jsp?dgbnr=2011365320&embnr=0&regnr=BR15597

使用我的浏览器访问上述URL时，我会得到我期望的内容。

使用$ resultcontent检索内容时，我会得到一个不同的结果，用挪威语“无效输入”表示。

任何想法为什么？

Answer 1

问题在于您的网址查询参数。

http://w2.brreg.no/motorvogn/dagbokutskrift.jsp?dgbnr=2011365320&embnr=0&regnr=BR15597

URL中的字符串'＆amp; reg'将在file_get_contents函数中转换为Symbol®，这将阻止您获得实际结果。

您可以在第11行

中使用html_entity_decode功能

$resultcontent=file_get_contents(html_entity_decode($result));

Answer 2

foreach($html->find('a') as $element) 
       if(preg_match('*dagb*',$element)) {
       $result=$urlpart.$element->href;
       $resultcontent=file_get_contents(html_entity_decode($result));
       echo $resultcontent;

       }

这应该可以解决问题。

简单的html dom - 比预期的其他结果

2 个答案: