使用xpath获取href

时间:2016-02-20 13:31:45

标签: php xpath web-scraping

我正在尝试使用xpath

提取2位数据
  1. 文本节点值和
  2. 超链接。
  3. 这是我的代码:

    <?php
    $curl = curl_init('http://www.livescore.com/soccer/england/league-2/');
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/8.0.552.224 Safari/534.10');
    $html = curl_exec($curl);
    curl_close($curl);
    if (!$html) 
        {
        die("something's wrong!");
        }
    
    $dom = new DOMDocument();
    @$dom->loadHTML($html);
    $xpath = new DOMXPath($dom);
    
    $result = $xpath->query("/html/body/div[2]/div[5]/div[contains(@class, 'row')]");
    
    var_dump ($result);
    foreach($result as $row)
        {   
    
        $text = $row->nodeValue;
        $href = $row->getAttribute("href");
    
        //getAttribute("href")
    
        $array[] = array
            (
            'text' => trim($text),
            'href' => $href
            );
    
        }
        print "<pre>";
        var_dump ($array);
    ?>
    

    我只是无法提取href链接!!任何帮助都会非常受欢迎。非常感谢

1 个答案:

答案 0 :(得分:2)

首先,该页面中的数据行可以通过更具体的类名row-gray找到。然后,要获取当前div中的链接,您可以使用相对XPath表达式.//a[@class='scorelink']

$result = $xpath->query("//div[contains(@class, 'row-gray')]");

foreach($result as $row)
{   
    $text = $row->nodeValue;
    $link = $xpath->query(".//a[@class='scorelink']", $row)->item(0);
    $href = $link->getAttribute("href");

    $array[] = array
    (
        'text' => trim($text),
        'href' => $href
    );

}