我必须解析这样的HTML结构:
<div class='container>
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Alpha'>...</span>
</div>
<div class='summary'>
<span data-summary='Exclusive'>Text 1</span>
</div>
</div>
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Beta'>...</span>
</div>
<div class='summary'>
<span data-summary='Non-Exclusive'>Text 2</span>
</div>
</div>
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Gamma'>...</span>
</div>
<div class='summary'>
<span data-summary='Exclusive'>Text 3</span>
</div>
</div>
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Delta'>...</span>
</div>
<div class='summary'>
<span data-summary='Non-Exclusive'>Text 4</span>
</div>
</div>
...
<div class='inner-div'>
<span class='text'>...</span>
<div class='author'>
<span data-author='Zeta'>...</span>
</div>
<div class='summary'>
<span data-summary='Exclusive'>Text 5</span>
</div>
</div>
</div>
我希望获得第一个'独家'摘要,其中作者不是'Alpha'。在上面的例子中,它将是'Text 3'。我如何使用Simple HTML DOM甚至XML DOM解析它?
ADDENDUM:我正在寻找使用PHP Simple HTML Dom库解析HTML。我知道如何在jQuery中解析它,但Simple HTML Dom库似乎不支持(:has)的任何等价物。
答案 0 :(得分:0)
最后,我自己解决了。对于任何寻求解决方案的人来说,这就是我所做的。
$node = $html->find("span[data-summary='Exclusive']",0);
if ($node->parent()->parent()->find('div.author span',0)['data-author'] == 'Alpha') {
$node = $html->find("span[data-summary='Exclusive']",1);
}
return $node->innertext;
答案 1 :(得分:0)
不,但这里有一个simple html dom replacement that(你想要:not
而不是:has
btw):
include_once('advanced_html_dom.php');
$html = str_get_html($str);
echo $html->find('.author:not(> [data-author=Alpha]) ~ .summary > [data-summary=Exclusive]', 0);