Simple HTML Dom是否支持:有解析吗?

时间:2017-09-09 03:36:02

标签: php html dom xml-parsing simple-html-dom

我必须解析这样的HTML结构:

<div class='container>
    <div class='inner-div'>
        <span class='text'>...</span>
        <div class='author'>
            <span data-author='Alpha'>...</span>
        </div>
        <div class='summary'>
            <span data-summary='Exclusive'>Text 1</span>
        </div>
    </div>
    <div class='inner-div'>
        <span class='text'>...</span>
        <div class='author'>
            <span data-author='Beta'>...</span>
        </div>
        <div class='summary'>
            <span data-summary='Non-Exclusive'>Text 2</span>
        </div>
    </div>
    <div class='inner-div'>
        <span class='text'>...</span>
        <div class='author'>
            <span data-author='Gamma'>...</span>
        </div>
        <div class='summary'>
            <span data-summary='Exclusive'>Text 3</span>
        </div>
    </div>
    <div class='inner-div'>
        <span class='text'>...</span>
        <div class='author'>
            <span data-author='Delta'>...</span>
        </div>
        <div class='summary'>
            <span data-summary='Non-Exclusive'>Text 4</span>
        </div>
    </div>
    ...
    <div class='inner-div'>
        <span class='text'>...</span>
        <div class='author'>
            <span data-author='Zeta'>...</span>
        </div>
        <div class='summary'>
            <span data-summary='Exclusive'>Text 5</span>
        </div>
    </div>
</div>

我希望获得第一个'独家'摘要,其中作者不是'Alpha'。在上面的例子中,它将是'Text 3'。我如何使用Simple HTML DOM甚至XML DOM解析它?

ADDENDUM:我正在寻找使用PHP Simple HTML Dom库解析HTML。我知道如何在jQuery中解析它,但Simple HTML Dom库似乎不支持(:has)的任何等价物。

2 个答案:

答案 0 :(得分:0)

最后,我自己解决了。对于任何寻求解决方案的人来说,这就是我所做的。

$node = $html->find("span[data-summary='Exclusive']",0);
if ($node->parent()->parent()->find('div.author span',0)['data-author'] == 'Alpha') {
    $node = $html->find("span[data-summary='Exclusive']",1);
}
return $node->innertext;

答案 1 :(得分:0)

不,但这里有一个simple html dom replacement that(你想要:not而不是:has btw):

include_once('advanced_html_dom.php');

$html = str_get_html($str);

echo $html->find('.author:not(> [data-author=Alpha]) ~ .summary > [data-summary=Exclusive]', 0);