DOMXpath查询嵌套多个类

时间:2016-12-06 13:00:42

标签: php xpath domxpath

我正在使用以下结构对html文件执行解析:

<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Get_text 6</li>                                 
                <li class="txtArea emp-un-area">Get_text 7</li>
                <li class="txtToilet emp-un-bath">Get_text 8</li>
                <li class="txtCar emp-un-park">Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
        </div>
    </div>
</div>
<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Other Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Other Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Other Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Other Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Other Get_text 6</li>                                   
                <li class="txtArea emp-un-area">Other Get_text 7</li>
                <li class="txtToilet emp-un-bath">Other Get_text 8</li>
                <li class="txtCar emp-un-park">Other Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
        </div>
    </div>
</div>

以下栏目:

<div class="lstImv blackBd12"></div>

它涵盖了目标textContents所在的其他标签,它会重复几次(在示例中,在编辑之后,我只放了2个)。

然后通过这段代码:

<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTMLFile($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom);
$content = $xpath->query('//div[@class="lstImv blackBd12"]');
foreach($content as $span)
{
    echo "<pre>";
        print_r($span);
    echo "</pre>";
}
?>

我得到了两个带有值的对象:

DOMElement Object
(
    [tagName] => div
    [schemaTypeInfo] => 
    [nodeName] => div
    [nodeValue] => 











        Get_text 1





                Get_text 2
                Get_text 3


                Get_text 4
                Get_text 5




            Get_text 6                                  
                Get_text 7
                Get_text 8
                Get_text 9


            Get_text 10



    [nodeType] => 1
    [parentNode] => (object value omitted)
    [childNodes] => (object value omitted)
    [firstChild] => (object value omitted)
    [lastChild] => (object value omitted)
    [previousSibling] => 
    [nextSibling] => (object value omitted)
    [attributes] => (object value omitted)
    [ownerDocument] => (object value omitted)
    [namespaceURI] => 
    [prefix] => 
    [localName] => div
    [baseURI] => 
    [textContent] => 











        Get_text 1





                Get_text 2
                Get_text 3


                Get_text 4
                Get_text 5




            Get_text 6                                  
                Get_text 7
                Get_text 8
                Get_text 9


            Get_text 10



)
DOMElement Object
(
    [tagName] => div
    [schemaTypeInfo] => 
    [nodeName] => div
    [nodeValue] => 











        Other Get_text 1





                Other Get_text 2
                Other Get_text 3


                Other Get_text 4
                Other Get_text 5




            Other Get_text 6                                
                Other Get_text 7
                Other Get_text 8
                Other Get_text 9


            Other Get_text 10



    [nodeType] => 1
    [parentNode] => (object value omitted)
    [childNodes] => (object value omitted)
    [firstChild] => (object value omitted)
    [lastChild] => (object value omitted)
    [previousSibling] => (object value omitted)
    [attributes] => (object value omitted)
    [ownerDocument] => (object value omitted)
    [namespaceURI] => 
    [prefix] => 
    [localName] => div
    [baseURI] => 
    [textContent] => 











        Other Get_text 1





                Other Get_text 2
                Other Get_text 3


                Other Get_text 4
                Other Get_text 5




            Other Get_text 6                                
                Other Get_text 7
                Other Get_text 8
                Other Get_text 9


            Other Get_text 10



)

所以我正在做的事情:

<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTMLFile($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom);
$content = $xpath->query('//strong[@class="imvFse emp-fase"]');
foreach($content as $span)
{
    echo "Key 1 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//strong[@class="emp-nome infNme colorTxt"]');
foreach($content as $span)
{
    echo "Key 2 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
foreach($content as $span)
{
    echo "Key 3 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
foreach($content as $span)
{
    echo "Key 4 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
foreach($content as $span)
{
    echo "Key 5 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtArea emp-un-area"]');
foreach($content as $span)
{
    echo "Key 6 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtCar emp-un-park"]');
foreach($content as $span)
{
    echo "Key 7 : ".$span->textContent."<br/>";
}
?>

我以这种方式得到数据:

Key 1 : Get_text 1
Key 1 : Other Get_text 1
Key 2 : 
Key 2 : 
Key 3 : Get_text 2
Key 3 : Other Get_text 2
Key 4 : Get_text 3
Key 4 : Other Get_text 3
Key 5 : Get_text 6
Key 5 : Other Get_text 6
Key 6 : Get_text 7
Key 6 : Other Get_text 7
Key 7 : Get_text 9
Key 7 : Other Get_text 9

换句话说,它是迭代键,但我希望键顺序进入(K1,k2,...,k7,k1,k2,...,k7),而不是形式即(k1,k1,k2,k2 ......,k7,k7)。

抱歉,我的英语不好,我还是会好的......

1 个答案:

答案 0 :(得分:0)

这是我得到的解决方案:

<?php
$html = <<<HTML
<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Get_text 6</li>                                 
                <li class="txtArea emp-un-area">Get_text 7</li>
                <li class="txtToilet emp-un-bath">Get_text 8</li>
                <li class="txtCar emp-un-park">Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
        </div>
    </div>
</div>
<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Other Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Other Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Other Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Other Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Other Get_text 6</li>                                   
                <li class="txtArea emp-un-area">Other Get_text 7</li>
                <li class="txtToilet emp-un-bath">Other Get_text 8</li>
                <li class="txtCar emp-un-park">Other Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
        </div>
    </div>
</div>
HTML;

$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTML($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom);


$items = $xpath->query('//div[@class="lstImv blackBd12"]');
for($i = 0; $i < $items->length; $i++)
{
    $status = $xpath->query('//strong[@class="imvFse emp-fase"]');
    echo "Value     :".$status->item($i)->nodeValue."<br/>";    

    $titulo = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
    echo "Value     :".$titulo->item($i)->nodeValue."<br/>";

    $titulo2 = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
    echo "Value     :".$titulo2->item($i)->nodeValue."<br/>";   

    $valor = $xpath->query('//em[@class="emp-valor-apartir"]');
    echo "Value     :".$valor->item($i)->nodeValue."<br/>"; 

    $valor2 = $xpath->query('//strong[@class="emp-valor infVlr colorTxt"]');
    echo "Value     :".$valor2->item($i)->nodeValue."<br/>";

    $dorm = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
    echo "Value     :".$dorm->item($i)->nodeValue."<br/>";

    $tam = $xpath->query('//li[@class="txtArea emp-un-area"]');
    echo "Value     :".$tam->item($i)->nodeValue."<br/>";   

}
?>

ideone