使用PHP和XPATH进行爬网

时间:2014-02-13 10:37:13

标签: php xpath

我正在抓取page,因为我想在我们的网站上展示它。我在获取每个团队的链接时遇到问题。我得到了团队名称,但我无法获得href属性。

我的代码如下所示:

elements = $xpath->query("//table/tr[contains(@class,'sr')]/td[contains(@class,'c')]");

$count = 0;
foreach ($elements as $elt) {
  if($count == 0)
  {
    $stringInsert = utf8_decode($elt->textContent);
  }
  else if($count == 1)
  {
             // tries to echo the href here, but dont get it.
             echo $elt->getAttribute('href')

     $stringInsert .= ", '".trim(utf8_decode($elt->textContent))."'";
  }
  else if($count == 3)
  {
     $stringInsert .= ", ".utf8_decode($elt->textContent);
  }
  else if($count == 4)
  {
     $stringInsert .= ", ".utf8_decode($elt->textContent);
  }
  else if($count == 5)
  {
     $stringInsert .= ", ".utf8_decode($elt->textContent);
  }
  else if($count == 6)
  {
     $stringInsert .= ", ".utf8_decode($elt->textContent);
  }
  else if($count == 7)
  {
     $stringInsert .= ", ".utf8_decode($elt->textContent);
  }
  else if($count == 9)
  {
     $stringInsert .= ", ".utf8_decode($elt->textContent);
  }
  else if($count == 10)
  {
     $stringInsert .= ", ".utf8_decode($elt->textContent);
  }

       $count++;

   if($count == 12)
   {
       echo $stringInsert;
       $count = 0;
   }

  }

正如您在代码中看到的,我尝试在count == 1中回显$ elt-> getAttribute('href'),但它没有显示任何内容。

我试图在xpath条件中添加一个/ a,但是它只获取了团队名称而不是所有其他东西,如得分,点数等。

1 个答案:

答案 0 :(得分:0)

您似乎在查询td元素,它们没有属性href。

这个例子可能有用:

//array to store the results
$res = array();

//loop over all <tr> elements of the table.srPoolPosition
foreach ($path->query("//table[contains(@class,'srPoolPosition')]/tr") as $row) {

    //new array to store results in each row
    $rowRes = array();

    //get the <td> elements in current <tr>
    $fields = $path->query('td', $row);
    //skip if not 12 fields
    if ($fields->length < 12) {
        continue;
    }
    //loop over those
    foreach ($fields as $field) {
        //store the textcontent in the current rows array
        $rowRes[] = utf8_decode($field->textContent);
    }

    //query for the link in the current row
    $link = $path->query("a", $row)->item(0)->getAttribute('href');
    //add the link to the results array
    rowRes[] = $link;

    //then add it to the results
    $res[] = $rowRes;
}

//example loop over the results
foreach ($res as $tableRow) {
    echo sprintf(
         '<a href="%s">%s</a>: %s - %s<br>', 
          $tableRow[13],  //link href
          $tableRow[1],   //name
          $tableRow[7],   //score 1
          $tableRow[9]    //score 2
    );
}