我正在使用以下结构对html
文件执行解析:
<div class="lstImv blackBd12">
<div class="stCl3 stLeft imvImg">
<div class="imgBox">
<a class="emp-imgs-link">
<span class="imgFrm frmBig frmLeft">
<img class="emp-img-principal">
</span>
<span class="imgFrm frmMd frmTop">
<img class="emp-img-logo">
</span>
<span class="imgFrm frmMd frmBot">
<img class="emp-img-foto">
</span>
</a>
</div>
<strong class="imvFse emp-fase">Get_text 1</strong>
</div>
<div class="imvInf stCl3 stRight">
<div class="infHd">
<div class="hdLeft stCl2">
<strong class="emp-nome infNme colorTxt"></strong>
<span class="emp-loc-part1 infLoc">Get_text 2</span>
<span class="emp-loc-part2 infLoc">Get_text 3</span>
</div>
<div class="hdRight stCl1">
<em class="emp-valor-apartir" >Get_text 4</em>
<strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
</div>
</div>
<div class="infTxt">
<p class="blackTxt60 emp-descritivo"></p>
<ul>
<li class="txtBed emp-un-dorms">Get_text 6</li>
<li class="txtArea emp-un-area">Get_text 7</li>
<li class="txtToilet emp-un-bath">Get_text 8</li>
<li class="txtCar emp-un-park">Get_text 9</li>
</ul>
</div>
<div class="infBt">
<a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
</div>
</div>
</div>
<div class="lstImv blackBd12">
<div class="stCl3 stLeft imvImg">
<div class="imgBox">
<a class="emp-imgs-link">
<span class="imgFrm frmBig frmLeft">
<img class="emp-img-principal">
</span>
<span class="imgFrm frmMd frmTop">
<img class="emp-img-logo">
</span>
<span class="imgFrm frmMd frmBot">
<img class="emp-img-foto">
</span>
</a>
</div>
<strong class="imvFse emp-fase">Other Get_text 1</strong>
</div>
<div class="imvInf stCl3 stRight">
<div class="infHd">
<div class="hdLeft stCl2">
<strong class="emp-nome infNme colorTxt"></strong>
<span class="emp-loc-part1 infLoc">Other Get_text 2</span>
<span class="emp-loc-part2 infLoc">Other Get_text 3</span>
</div>
<div class="hdRight stCl1">
<em class="emp-valor-apartir" >Other Get_text 4</em>
<strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
</div>
</div>
<div class="infTxt">
<p class="blackTxt60 emp-descritivo"></p>
<ul>
<li class="txtBed emp-un-dorms">Other Get_text 6</li>
<li class="txtArea emp-un-area">Other Get_text 7</li>
<li class="txtToilet emp-un-bath">Other Get_text 8</li>
<li class="txtCar emp-un-park">Other Get_text 9</li>
</ul>
</div>
<div class="infBt">
<a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
</div>
</div>
</div>
以下栏目:
<div class="lstImv blackBd12"></div>
它涵盖了目标textContents所在的其他标签,它会重复几次(在示例中,在编辑之后,我只放了2个)。
然后通过这段代码:
<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8');
$dom->loadHTMLFile($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$content = $xpath->query('//div[@class="lstImv blackBd12"]');
foreach($content as $span)
{
echo "<pre>";
print_r($span);
echo "</pre>";
}
?>
我得到了两个带有值的对象:
DOMElement Object
(
[tagName] => div
[schemaTypeInfo] =>
[nodeName] => div
[nodeValue] =>
Get_text 1
Get_text 2
Get_text 3
Get_text 4
Get_text 5
Get_text 6
Get_text 7
Get_text 8
Get_text 9
Get_text 10
[nodeType] => 1
[parentNode] => (object value omitted)
[childNodes] => (object value omitted)
[firstChild] => (object value omitted)
[lastChild] => (object value omitted)
[previousSibling] =>
[nextSibling] => (object value omitted)
[attributes] => (object value omitted)
[ownerDocument] => (object value omitted)
[namespaceURI] =>
[prefix] =>
[localName] => div
[baseURI] =>
[textContent] =>
Get_text 1
Get_text 2
Get_text 3
Get_text 4
Get_text 5
Get_text 6
Get_text 7
Get_text 8
Get_text 9
Get_text 10
)
DOMElement Object
(
[tagName] => div
[schemaTypeInfo] =>
[nodeName] => div
[nodeValue] =>
Other Get_text 1
Other Get_text 2
Other Get_text 3
Other Get_text 4
Other Get_text 5
Other Get_text 6
Other Get_text 7
Other Get_text 8
Other Get_text 9
Other Get_text 10
[nodeType] => 1
[parentNode] => (object value omitted)
[childNodes] => (object value omitted)
[firstChild] => (object value omitted)
[lastChild] => (object value omitted)
[previousSibling] => (object value omitted)
[attributes] => (object value omitted)
[ownerDocument] => (object value omitted)
[namespaceURI] =>
[prefix] =>
[localName] => div
[baseURI] =>
[textContent] =>
Other Get_text 1
Other Get_text 2
Other Get_text 3
Other Get_text 4
Other Get_text 5
Other Get_text 6
Other Get_text 7
Other Get_text 8
Other Get_text 9
Other Get_text 10
)
所以我正在做的事情:
<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8');
$dom->loadHTMLFile($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$content = $xpath->query('//strong[@class="imvFse emp-fase"]');
foreach($content as $span)
{
echo "Key 1 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//strong[@class="emp-nome infNme colorTxt"]');
foreach($content as $span)
{
echo "Key 2 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
foreach($content as $span)
{
echo "Key 3 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
foreach($content as $span)
{
echo "Key 4 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
foreach($content as $span)
{
echo "Key 5 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtArea emp-un-area"]');
foreach($content as $span)
{
echo "Key 6 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtCar emp-un-park"]');
foreach($content as $span)
{
echo "Key 7 : ".$span->textContent."<br/>";
}
?>
我以这种方式得到数据:
Key 1 : Get_text 1
Key 1 : Other Get_text 1
Key 2 :
Key 2 :
Key 3 : Get_text 2
Key 3 : Other Get_text 2
Key 4 : Get_text 3
Key 4 : Other Get_text 3
Key 5 : Get_text 6
Key 5 : Other Get_text 6
Key 6 : Get_text 7
Key 6 : Other Get_text 7
Key 7 : Get_text 9
Key 7 : Other Get_text 9
换句话说,它是迭代键,但我希望键顺序进入(K1,k2,...,k7,k1,k2,...,k7),而不是形式即(k1,k1,k2,k2 ......,k7,k7)。
抱歉,我的英语不好,我还是会好的......答案 0 :(得分:0)
这是我得到的解决方案:
<?php
$html = <<<HTML
<div class="lstImv blackBd12">
<div class="stCl3 stLeft imvImg">
<div class="imgBox">
<a class="emp-imgs-link">
<span class="imgFrm frmBig frmLeft">
<img class="emp-img-principal">
</span>
<span class="imgFrm frmMd frmTop">
<img class="emp-img-logo">
</span>
<span class="imgFrm frmMd frmBot">
<img class="emp-img-foto">
</span>
</a>
</div>
<strong class="imvFse emp-fase">Get_text 1</strong>
</div>
<div class="imvInf stCl3 stRight">
<div class="infHd">
<div class="hdLeft stCl2">
<strong class="emp-nome infNme colorTxt"></strong>
<span class="emp-loc-part1 infLoc">Get_text 2</span>
<span class="emp-loc-part2 infLoc">Get_text 3</span>
</div>
<div class="hdRight stCl1">
<em class="emp-valor-apartir" >Get_text 4</em>
<strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
</div>
</div>
<div class="infTxt">
<p class="blackTxt60 emp-descritivo"></p>
<ul>
<li class="txtBed emp-un-dorms">Get_text 6</li>
<li class="txtArea emp-un-area">Get_text 7</li>
<li class="txtToilet emp-un-bath">Get_text 8</li>
<li class="txtCar emp-un-park">Get_text 9</li>
</ul>
</div>
<div class="infBt">
<a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
</div>
</div>
</div>
<div class="lstImv blackBd12">
<div class="stCl3 stLeft imvImg">
<div class="imgBox">
<a class="emp-imgs-link">
<span class="imgFrm frmBig frmLeft">
<img class="emp-img-principal">
</span>
<span class="imgFrm frmMd frmTop">
<img class="emp-img-logo">
</span>
<span class="imgFrm frmMd frmBot">
<img class="emp-img-foto">
</span>
</a>
</div>
<strong class="imvFse emp-fase">Other Get_text 1</strong>
</div>
<div class="imvInf stCl3 stRight">
<div class="infHd">
<div class="hdLeft stCl2">
<strong class="emp-nome infNme colorTxt"></strong>
<span class="emp-loc-part1 infLoc">Other Get_text 2</span>
<span class="emp-loc-part2 infLoc">Other Get_text 3</span>
</div>
<div class="hdRight stCl1">
<em class="emp-valor-apartir" >Other Get_text 4</em>
<strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
</div>
</div>
<div class="infTxt">
<p class="blackTxt60 emp-descritivo"></p>
<ul>
<li class="txtBed emp-un-dorms">Other Get_text 6</li>
<li class="txtArea emp-un-area">Other Get_text 7</li>
<li class="txtToilet emp-un-bath">Other Get_text 8</li>
<li class="txtCar emp-un-park">Other Get_text 9</li>
</ul>
</div>
<div class="infBt">
<a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
</div>
</div>
</div>
HTML;
$dom = new domDocument('1.0', 'utf-8');
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$items = $xpath->query('//div[@class="lstImv blackBd12"]');
for($i = 0; $i < $items->length; $i++)
{
$status = $xpath->query('//strong[@class="imvFse emp-fase"]');
echo "Value :".$status->item($i)->nodeValue."<br/>";
$titulo = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
echo "Value :".$titulo->item($i)->nodeValue."<br/>";
$titulo2 = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
echo "Value :".$titulo2->item($i)->nodeValue."<br/>";
$valor = $xpath->query('//em[@class="emp-valor-apartir"]');
echo "Value :".$valor->item($i)->nodeValue."<br/>";
$valor2 = $xpath->query('//strong[@class="emp-valor infVlr colorTxt"]');
echo "Value :".$valor2->item($i)->nodeValue."<br/>";
$dorm = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
echo "Value :".$dorm->item($i)->nodeValue."<br/>";
$tam = $xpath->query('//li[@class="txtArea emp-un-area"]');
echo "Value :".$tam->item($i)->nodeValue."<br/>";
}
?>