通过DOM解析多个URL。 PHP @ $ dom-> loadHTML($ html。$ html2);

时间:2013-12-17 03:57:13

标签: php html dom

我正在尝试使用PHP DOM解析2个URL。我想知道是否有人能够让我了解我哪里出错了?是不是可以解析这两个URL,就像我在下面做的那样?

<?php
$html = file_get_contents('http://www.reddit.com/r/funny');
$html2 = file_get_contents('http://www.9gag.com/');
$dom = new DOMDocument();
@$dom->loadHTML($html.$html2);

$xpath = new DOMXPath($dom);
$hyperlinks = $xpath->evaluate('//a[@class="thumbnail "]');
$hyperlinks2 = $xpath->evaluate('//a[@class="badge-item-img"]');

foreach($hyperlinks as $hyperlink) {
 if(strpos($hyperlink->getAttribute('href'), 'http://i.imgur.com/') !== FALSE){
echo "<img style='padding-left:30%' width=\"500\" src=\"" . $hyperlink->getAttribute('href') . "\" alt=\"\" />";
  echo "<br>";
  echo "<br>";
    echo "<br>";

}
  else{
       echo "";
  }
}
?>

编辑此编辑已添加,因为我正在尝试对class =“badge-item-img”进行比较,并且它没有返回任何内容。我不可能像我这样做吗?

<?php
// Init the '$url_array' array.
$url_array = array();
$url_array[] = 'http://www.reddit.com/r/funny';
$url_array[] = 'http://www.9gag.com/';

// Init the return '$ret' array.
$ret = array();

// Roll through the '$url_array' array.
foreach ($url_array as $url_value) {
  $html = file_get_contents($url_value);
  $dom = new DOMDocument();
  $dom2 = new DOMDocument();
  @$dom->loadHTML($html);

  $xpath = new DOMXPath($dom);
  $xpath2 = new DOMXPath($dom2);
  $hyperlinks = $xpath->evaluate('//a[@class="thumbnail "]');
  $hyperlinks2 = $xpath2->evaluate('//a[@class="badge-item-img"]');

  foreach($hyperlinks as $hyperlink) {
    if(strpos($hyperlink->getAttribute('href'), 'http://i.imgur.com/') !== FALSE){
      $ret[] = "<img style='padding-left:30%' width=\"500\" src=\"" . $hyperlink->getAttribute('href') . "\" alt=\"\" />"
             . "<br>"
             . "<br>"
             . "<br>"
             ;

    }
    foreach($hyperlinks2 as $hyperlinker) {
            $ret[] = "<img style='padding-left:30%' width=\"500\" src=\"" . $hyperlinker->getAttribute('href') . "\" alt=\"\" />"
             . "<br>"
             . "<br>"
             . "<br>"
             ;
    }
  } 
  }
// Roll through the '$ret' array.
foreach($ret as $ret_value) {
  echo $ret_value;
}

2 个答案:

答案 0 :(得分:1)

似乎您正在尝试将一个HTML文件的内容组合在另一个上面。最终会出现一个很可能会阻塞DOM解析器的文档。相反,你应该循环访问URL&amp;然后输出结果:

<?php

// Init the '$url_array' array.
$url_array = array();
$url_array[] = 'http://www.reddit.com/r/funny';
$url_array[] = 'http://www.9gag.com/';

// Init the return '$ret' array.
$ret = array();

// Roll through the '$url_array' array.
foreach ($url_array as $url_value) {
  $html = file_get_contents($url_value);
  $dom = new DOMDocument();
  @$dom->loadHTML($html);

  $xpath = new DOMXPath($dom);
  $hyperlinks = $xpath->evaluate('//a[@class="thumbnail "]');
  $hyperlinks2 = $xpath->evaluate('//a[@class="badge-item-img"]');

  foreach($hyperlinks as $hyperlink) {
    if(strpos($hyperlink->getAttribute('href'), 'http://i.imgur.com/') !== FALSE){
      $ret[] = "<img style='padding-left:30%' width=\"500\" src=\"" . $hyperlink->getAttribute('href') . "\" alt=\"\" />"
             . "<br>"
             . "<br>"
             . "<br>"
             ;
    }
  }
}

// Roll through the '$ret' array.
foreach($ret as $ret_value) {
  echo $ret_value;
}

?>

答案 1 :(得分:0)

我不确定我是否看到了这个问题。我在本地测试了这个代码并且它有效。你收到某种错误了吗?