如何获取html页面的一部分

时间:2013-09-27 05:18:38

标签: php html

您的所有示例代码如下:

  <?php 
  $html = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  <html xmlns="http://www.w3.org/1999/xhtml">
  <head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <title>samplecode</title>
  </head>
  <body>
    <div id="warrper">
      <div class="box-title">This title is sample</div>
      <div class="box-maim">
        <div class="box-element-1">
           <ul>
              <li>sample 1</li>
              <li>sample 2</li>
              <li>sample 3</li>
              <li>sample 4</li>
              <li>sample 5</li>
           </ul>
        </div>
        <div class="box-element-1">
           <ul>
              <li>sample 1</li>
              <li>sample 2</li>
              <li>sample 3</li>
              <li>sample 4</li>
              <li>sample 5</li>
           </ul>   
        </div>
        <div class="box-element-1">
           <ul>
              <li>sample 1</li>
              <li>sample 2</li>
              <li>sample 3</li>
              <li>sample 4</li>
              <li>sample 5</li>
           </ul>
        </div>
      </div>
    </div>
  </body>
  </html> ';

   preg_match( '/<div class="box-maim">(.*?)<\/div>/si' , $html , $match );

   print_r($match);
  ?>

我从url加载html后的目标是,仅获取所选类的标记部分,例如代码下方:

  <div class="box-element-1">
     <ul>
        <li>sample 1</li>
        <li>sample 2</li>
        <li>sample 3</li>
        <li>sample 4</li>
        <li>sample 5</li>
     </ul>
  </div>
  <div class="box-element-1">
     <ul>
        <li>sample 1</li>
        <li>sample 2</li>
        <li>sample 3</li>
        <li>sample 4</li>
        <li>sample 5</li>
     </ul>   
  </div>
  <div class="box-element-1">
     <ul>
        <li>sample 1</li>
        <li>sample 2</li>
        <li>sample 3</li>
        <li>sample 4</li>
        <li>sample 5</li>
     </ul>
  </div>

但我不知道该部分的正确方法。

2 个答案:

答案 0 :(得分:-1)

正如所有建议使用DOM一样,请尝试以下代码:

<?php
$html = '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  <html xmlns="http://www.w3.org/1999/xhtml">
  <head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <title>samplecode</title>
  </head>
  <body>
    <div id="warrper">
      <div class="box-title">This title is sample</div>
      <div class="box-maim">
        <div class="box-element-1">
           <ul>
              <li>sample 1</li>
              <li>sample 2</li>
              <li>sample 3</li>
              <li>sample 4</li>
              <li>sample 5</li>
           </ul>
        </div>
        <div class="box-element-1">
           <ul>
              <li>sample 1</li>
              <li>sample 2</li>
              <li>sample 3</li>
              <li>sample 4</li>
              <li>sample 5</li>
           </ul>   
        </div>
        <div class="box-element-1">
           <ul>
              <li>sample 1</li>
              <li>sample 2</li>
              <li>sample 3</li>
              <li>sample 4</li>
              <li>sample 5</li>
           </ul>
        </div>
      </div>
    </div>
  </body>
  </html> ';

$dom = new DOMDocument();    
$dom->loadHTML($html);    
$xpath = new DOMXPath($dom);    
$div = $xpath->query('//div[@class="box-maim"]');    
$div = $div->item(0);    
echo $dom->saveXML($div);    
?>

完美无缺:)

答案 1 :(得分:-1)

如果我能帮到你,那就简单地说就是这样:

preg_match_all('/<div\s[^>]*class=\"box-element-([^\"]*)\"[^>]*>(.*)<\/div>/siU', $html, $matches, PREG_SET_ORDER);
echo '<pre>';
print_r($matches);