使用PHP和XPATH,你如何获得最接近的`h3`的内容?

时间:2014-10-02 11:25:18

标签: php html xpath domdocument

  • 使用PHP和XPATH,您如何正确获取包含斯托克城与水晶宫匹配日期的最近h3标签的文本? (例如10月4日星期六)
  • 本质上我正在寻找比赛日期,我的输入是主队和客场球队
  • HTML片段列表4(其余321个)2014/15赛季英超联赛足球赛事

    <div class="fixtures">
        <h3>Monday 29 September</h3>
        <dl class="matches">
            <dt class="match">
                <span class="match-time">20:00</span>
                <span class="home-side">
                    <span>
                        <img src="http://omo.akamai.opta.net/image.php?&amp;sport=football&amp;entity=team&amp;description=badges&amp;dimensions=20&amp;id=110" alt="Stoke City">
                    </span>
                    <a href="http://www.dailymail.co.uk/sport/teampages/stoke-city.html">Stoke City</a>
                </span>
                <span>
                    <span>&nbsp;</span>
                    <span>vs</span>
                    <span>&nbsp;</span>
                </span>
                <span class="away-side">
                    <span>
                        <img src="http://omo.akamai.opta.net/image.php?&amp;sport=football&amp;entity=team&amp;description=badges&amp;dimensions=20&amp;id=4" alt="Newcastle United">
                    </span>
                    <a href="http://www.dailymail.co.uk/sport/teampages/newcastle-united.html">Newcastle United</a>
                </span>
            </dt>
        </dl>
        <dl class="matches">
            <dt class="match">
                <span class="match-time">15:00</span>
                <span class="home-side">
                    <span>
                        <img src="http://omo.akamai.opta.net/image.php?&amp;sport=football&amp;entity=team&amp;description=badges&amp;dimensions=20&amp;id=13" alt="Leicester City">
                    </span>
                    <a href="http://www.dailymail.co.uk/sport/teampages/leicester.html">Leicester City</a>
                </span>
                <span>
                    <span>&nbsp;</span>
                    <span>vs</span>
                    <span>&nbsp;</span>
                </span>
                <span class="away-side">
                    <span>
                        <img src="http://omo.akamai.opta.net/image.php?&amp;sport=football&amp;entity=team&amp;description=badges&amp;dimensions=20&amp;id=90" alt="Burnley">
                    </span>
                    <a href="http://www.dailymail.co.uk/sport/teampages/burnley.html">Burnley</a>
                </span>
            </dt>
        </dl>              
        <h3>Saturday 4 October</h3>
        <dl class="matches">
            <dt class="match">
                <span class="match-time">15:00</span>
                <span class="home-side">
                    <span>
                        <img src="http://omo.akamai.opta.net/image.php?&amp;sport=football&amp;entity=team&amp;description=badges&amp;dimensions=20&amp;id=14" alt="Liverpool">
                    </span>
                    <a href="http://www.dailymail.co.uk/sport/teampages/liverpool.html">Liverpool</a>
                </span>
                <span>
                    <span>&nbsp;</span>
                    <span>vs</span>
                    <span>&nbsp;</span>
                </span>
                <span class="away-side">
                    <span>
                        <img src="http://omo.akamai.opta.net/image.php?&amp;sport=football&amp;entity=team&amp;description=badges&amp;dimensions=20&amp;id=35" alt="West Bromwich Albion">
                    </span>
                    <a href="http://www.dailymail.co.uk/sport/teampages/west-bromwich-albion.html">West Bromwich Albion</a>
                </span>
            </dt>
        </dl>            
        <dl class="matches">
            <dt class="match">
                <span class="match-time">15:00</span>
                <span class="home-side">
                    <span>
                        <img src="http://omo.akamai.opta.net/image.php?&amp;sport=football&amp;entity=team&amp;description=badges&amp;dimensions=20&amp;id=110" alt="Stoke City">
                    </span>
                    <a href="http://www.dailymail.co.uk/sport/teampages/stoke-city.html">Stoke City</a>
                </span>
                <span>
                    <span>&nbsp;</span>
                    <span>vs</span>
                    <span>&nbsp;</span>
                </span>
                <span class="away-side">
                    <span>
                        <img src="http://omo.akamai.opta.net/image.php?&amp;sport=football&amp;entity=team&amp;description=badges&amp;dimensions=20&amp;id=31" alt="Crystal Palace">
                    </span>
                    <a href="http://www.dailymail.co.uk/sport/teampages/crystal-palace.html">Crystal Palace</a>
                </span>
            </dt>
        </dl>          
    </div>
    

2 个答案:

答案 0 :(得分:1)

如果结构总是相同的,你可以先用那个alt值指向那个img标签,然后向后遍历它。

示例:

$dom = new DOMDocument();
$dom->loadHTML($markup);
$xpath = new DOMXpath($dom);

$needle = 'Hull City';
$element = $xpath->query("//span/img[contains(@alt, '$needle')]");
if($element->length > 0) {
    $img = $element->item(0);
    $header = $xpath->query('ancestor::node()/preceding-sibling::h3[1]', $img);
    if($header->length > 0) {
        echo $header->item(0)->nodeValue; // Saturday 4 October
    }
}

Sample Output

答案 1 :(得分:1)

你可以尝试这个XPath:

//h3[following-sibling::dl[1][.//span[contains(concat(' ', normalize-space(@class), ' '), ' home-side ') and span/img[@alt='Hull City']]]]

基本上,在XPath之上选择<h3>元素,其下一个同级<dl>元素包含<span class="home-side">,另一个<span>包含<img alt="Hull City">(格式化版本):

//h3[
        following-sibling::dl[1][
                    .//span[
                        contains(concat(' ', normalize-space(@class), ' '), ' home-side ') 
                            and 
                        span/img[@alt='Hull City']
                    ]
        ]
    ]

更新:

以下是一个XPath示例,用于检查主队和客队:

//h3[
        following-sibling::dl[1][
                    .//span[
                        contains(concat(' ', normalize-space(@class), ' '), ' home-side ') 
                            and 
                        span/img[@alt='Hull City']
                    ]
                        and
                    .//span[
                        contains(concat(' ', normalize-space(@class), ' '), ' away-side ') 
                            and 
                        span/img[@alt='Crystal Palace']
                    ]
        ]

    ]

更新2:

为了能够考虑多个<dl>,我认为首先找到满足主客场标准的<dl>会更容易,然后向后移动以找到最接近的<h3>元素来自<dl>

//dl[
        .//span[
            contains(concat(' ', normalize-space(@class), ' '), ' home-side ') 
                and 
            span/img[@alt='Stoke City']
        ]
            and
        .//span[
            contains(concat(' ', normalize-space(@class), ' '), ' away-side ') 
                and 
            span/img[@alt='Crystal Palace']
        ]
    ]/preceding-sibling::h3[1]