使用xPath过滤

时间:2014-02-06 10:23:36

标签: php parsing xpath filter

我想解析一个网页,但我遇到了麻烦:

我最终有这个错误:

解析错误:语法错误,意外'Steve'(T_STRING)

<?php    
// Here I would like to parse a wikipedia page    
        $url = "http://fr.wikipedia.org/wiki/Jobs_(film)"; 
        $html = file_get_contents($url);  
        $doc = new DOMDocument();
        $doc->loadHTML($html); 
// I'm use xPath for parsing page
        $xpath = new DOMXpath($doc);
    // Here I save all links of the wikipedia page
        $nodes = $xpath->query('//a');
        ?>

    <?php
    // Showing elements :
    if($nodes)
    {

    echo '<h1>les <span class="red">'.$nodes->length. '</span> liens de la page : '.$url.'</h1>';
    // Table to show some elements
        echo '<table>
                <thead><tr><th>ancre</th><th>title</th><th>url</th><th>rel</th></tr></thead><tbody>';
    // Here I search all the elements with title, links...
        foreach($nodes as $node) {
            if($node->getAttribute('rel')){$rel = $node->getAttribute('rel');}else{$rel= "-";}
            if($node->getAttribute('title')){$title = $node->getAttribute('title');}else{$title= "-";}
            if($node->nodeValue){$ancre = $node->nodeValue;}else{$rel= "-";}
            if($node->getAttribute('href[contains(text(),'Steve')]')){$href = $node->getAttribute('href[contains(text(),'Steve')]');}else{$rel= "-";}
            // The table contains all element but i would like a filter...
            echo '<tr><td>'. $ancre .'</td><td>'. $title .'</td><td>'. $href .'</td><td>'.$rel.'</td></tr>';    
        }

        echo '</tbody></table>';
    }

3 个答案:

答案 0 :(得分:1)

它的报价问题

if($node->getAttribute("href[contains(text(),'Steve')]")){$href = $node->getAttribute("href[contains(text(),'Steve')]");}else{$rel= "-";}

答案 1 :(得分:1)

你的报价错了。 -

if($node->getAttribute("href[contains(text(),'Steve')]")){$href = $node->getAttribute("href[contains(text(),'Steve')]");}else{$rel= "-";}
                       ^                              ^                               ^                              ^

答案 2 :(得分:1)

此行中的单引号内有单引号:

if($node->getAttribute('href[contains(text(),'Steve')]'))

要么逃避它们(\')要么用双引号替换(")。