使用正则表达式搜索html字符串并在PHP中存储在数组中

时间:2014-05-10 10:30:04

标签: php html regex pcre

我需要搜索字符串,这可能是这样的:

<div class="icon_star">&nbsp;</div>

<div class="icon_star"></div>

<div class="icon_star"> </div>

我需要在HTML中搜索上面的字符串,这可能是这样的:

<h1 class="redword" tag="h1">
   <span class="BASE">good</span>
</h1>
<span class="headword-definition">&#160;-&#160;definition</span>
</span>
<div class="icon_star"></div>
<!-- End of DIV icon_star-->

<div class="icon_star"></div>
<!-- End of DIV icon_star-->

<div class="icon_star"></div>
<!-- End of DIV icon_star-->

</div><!-- End of DIV -->

<div class="headbar">
   <div id="helplinks-box" class="responsive_hide_on_smartphone">  

我们尝试在数组中搜索和存储的字符串可以多次

我尝试使用以下正则表达式:

preg_match_all ('/<div(\s)+class="icon_star">(.*?)<\/div>/i', $html1, $result_array1);

当要搜索的HTML

时,上面的正则表达式不起作用
<div id="headword">
    <div id="headwordright">
        <div style="display: none;" id="showmore"><a class="button" onmousedown="foldingSet(false)"><span class="label">Show more</span></a>
        </div><!-- End of DIV -->
        <div id="showless"><a class="button" onmousedown="foldingSet(true)"><span class="label">Show less</span></a>
        </div><!-- End of DIV -->
    </div><!-- End of DIV -->
    <span class="BASE-FORM">
        <h1 tag="h1" class="redword"><span class="BASE">scenario</span></h1>
        <span class="headword-definition">&nbsp;-&nbsp;definition</span>
    </span>
    <div class="icon_star">&nbsp;</div><!-- End of DIV icon_star-->
</div>

1 个答案:

答案 0 :(得分:3)

更新

您似乎正在以错误的方式阅读正则表达式结果。执行

preg_match_all('/<div(\s)+class="icon_star">.*?<\/div>/i', $html, $result_array1);

for($x = 0; $x < count($result_array1); $x++)
    $result_array1[$x] = array_map('htmlentities', $result_array1[$x]);

echo '<pre>' . print_r($result_array1, 1);

打印出来

   Array
   (
       [0] => Array
       (
           [0] => <div class="icon_star">&nbsp;</div>
       )

       [1] => Array
       (
           [0] =>  
       )

   )   

因此您应该检查$result_array1[0]而不是$result_array1

的计数

旁注

而不是使用正则表达式解析HTML,如果可以的话,可以使用PHP内置的DOMDocument类。
使用以下代码提取三个div。

请注意,您需要拥有有效的HTML才能使用此方法。

  //your HTML with tag added to make it valid
  $html = '<div>
     <h1 class="redword" tag="h1">
        <span class="BASE">good</span>
     </h1>
     <span class="headword-definition"><span>&#160;-&#160;definition</span></span>
     <div class="icon_star"></div>
     <div class="icon_star"></div>
     <div class="icon_star"></div>
  </div>
  <div class="headbar">
     <div id="helplinks-box" class="responsive_hide_on_smartphone">
     </div>
  </div>';

  $dom = new DOMDocument();
  @$dom->loadHTML($html);
  $x = new DOMXPath($dom);

  //this xpath query looks for all nodes that have "class" attribute value equal to "icon_star"
  $nodes = $x->query("//*[contains(@class, 'icon_star')]");

  $res = '';
  foreach($nodes as $node) {
     /**
      * @var $node DOMElement
      */
     $res .= $dom->saveHTML($node);
  }

  echo htmlentities($res);

您可以在stackoverflow上阅读以下有用的问题 How do you parse and process HTML/XML in PHP?
Getting DOM elements by classname