如何从PHP中的html页面中删除H2和H3标签?

时间:2015-12-17 08:40:22

标签: php

我需要从下面的html代码中获取h2和h3标签作为$ var:

<div class="main-info">
   <img class="iphone-img" alt="" src="https://www.myweb.com/securedImage.jsp">
        <div class="sub-info">
                <h2 class="model">iPhone 4S</h2>
                <h3 class="capacity color">16GB Black</h3>
          </div>
</div>

我想要这个结果:

echo $model; // Should echo:  'iPhone 4S'
echo $capacitycolour; // Should echo: '16GB Black'

我已尝试使用preg_matchpreg_match_allgetElementsByTagName但到目前为止没有运气。

这是我尝试过的代码:

$pattern = '/[^\n]h2*[^\n]*/';
preg_match_all($pattern,$data, $matches, PREG_OFFSET_CAPTURE);
var_dump($matches);

$doc = new DOMDocument();
$doc->loadHTML($data);
$tags = $doc->getElementsByTagName('sub-info');

$root = $doc->documentElement;
foreach($root->childNodes as $node){
    $attributes[$node->nodeName] = $node->nodeValue;
}

var_dump($attributes);

4 个答案:

答案 0 :(得分:5)

sub-info是类,而不是标记名,因此您对DOMDocument的使用存在缺陷,您最好使用XPath查询。

$strhtml='<div class="main-info">
            <img class="iphone-img" alt="" src="https://www.myweb.com/securedImage.jsp?configcode=DTF9&size=120x120">
            <div class="sub-info">
                <h2 class="model">
                        iPhone 4S
                </h2>
                <h3 class="capacity color">
                    16GB Black 
                </h3>
            </div>
        </div>';


$doc = new DOMDocument();
$doc->loadHTML( $strhtml );
$xpath=new DOMXPath( $doc );
$col=$xpath->query('//div[@class="sub-info"]/h2|//div[@class="sub-info"]/h3');
if( $col ){
    /* You could store results from query in an array */
    $tags=array();
    foreach( $col as $node ) {

        /* Simplest form to display results on separate lines, use br tag */
        echo $node->nodeValue . '<br />';

        /* Add tags to array - a rethink would be required if there are multiple h2 and h3 tags! */
        $tags[ $node->tagName ]=$node->nodeValue;

    }
    /* echo back results from array */
    echo $tags['h2'];
    echo '<br />';
    echo $tags['h3'];
}

答案 1 :(得分:1)

对于未来,只需尝试使用在线正则表达式测试程序来验证您的表达方式。

对于H2-Tags,以下方法可行:.*<h2.*>[\n\s]*(.*)(尽管没有最佳选择)

答案 2 :(得分:0)

在很多情况下,我之前使用过simple_html_dom.php,效果非常好。加载文档后,它允许选择器之类的CSS。此外,您可以从字符串,本地文件或URL解析!以下将为您提供一组Element s:

$div = $html->find('div.sub-info');
$ret = $div[0]->find('h2, h3');

API参考:here

警告:如果您确实看到here将会发生什么,请不要使用RegEx来解析HTML:)

答案 3 :(得分:0)

你是Cyber​​boki吗?

检查一下。

$strhtml='<div class="main-info">
        <img class="iphone-img" alt="" src="https://www.myweb.com/securedImage.jsp?configcode=DTF9&size=120x120">
        <div class="sub-info">
            <h2 class="model">
                    iPhone 4S
            </h2>
            <h3 class="capacity color">
                16GB Black 
            </h3>
        </div>
    </div>';
$new = preg_replace("/\s+/",' ',$strhtml);  
preg_match('/<h2 class="model">(.*?)<\/h2>/i', $new , $h2); 
preg_match('/<h3 class="capacity color">(.*?)<\/h3>/i', $new , $h3); 

echo "option 1";
echo "<br/>";
echo $h2[1];
echo "<br/>";
echo $h3[1];
echo "<br/>";
echo "<br/>";

    $ex = explode("\n",strip_tags($strhtml));   
    foreach($ex as $key){
        //echo $key;
        $line_out = preg_replace('/\s+/', ' ', trim($key));
        if(strlen($line_out) > 0){
            $rr[] = trim($key);
        }
    }
echo "option 2";
echo "<br/>";       
echo $rr[0];
echo "<br/>";
echo $rr[1];        

result:
option 1
iPhone 4S
16GB Black

option 2
iPhone 4S
16GB Black 

此致 iPhoneYeta