从PHP获取/过滤表行内容

时间:2012-02-07 03:35:49

标签: php javascript domdocument

如何从文件中获取所有TITLE和LINKS?

以下文件的示例内容

<tr class="odd">
 <td align="left" valign="top" class="text_cont_normal"> TITLE </td>
 <td align="left" valign="top" class="normal_text_link">
  <img border="0" onclick="javascript:window.location.href='LINK'" style="cursor: pointer;" alt="Download"  src="btn.jpg"/></td>
</tr>
<tr class="even">
 <td align="left" valign="top" class="text_cont_normal"> TITLE2 </td>
 <td align="left" valign="top" class="normal_text_link">
  <img border="0" onclick="javascript:window.location.href='LINK2'" style="cursor: pointer;" alt="Download"  src="btn.jpg"/></td>
</tr>

我试过

$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
 if ($tag->hasAttribute('onclick'))
    echo $tag->getAttribute('onclick').'<br>';
}

但是没有得到我真正想要的数据!

2 个答案:

答案 0 :(得分:1)

像这样,例如

$doc = new DOMDocument();
$doc->loadHTMLFile($filename);
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//td[@class="text_cont_normal"]');
 foreach($nodes as $node)
 {
    echo $node->nodeValue.'<br>';   // title
 }
$nodes = $xpath->query('//td[@class="normal_text_link"]/img[@alt="Download"]');
 foreach($nodes as $node)
 {
  if ($node->hasAttribute('onclick'))
     echo $node->getAttribute('onclick').'<br>';  //click
 }

如果您需要LINK,则重写

  if ($node->hasAttribute('onclick'))
  {
      echo $node->getAttribute('onclick').'<br>';  //click
      preg_match('/location\.href=(\'|")(.*?)\\1/i', 
                 $node->getAttribute('onclick'), $matches);
      if (isset($matches[2])) echo $matches[2].'<br>'; // the value
  }

或者您是否需要分组?

答案 1 :(得分:1)

一种可能的方式:

$nodes = $doc->getElementsByTagName('tr');
$max = $nodes->length;
for ($i = 0; $i < $max; $i++)
{
    echo $nodes->item($i)->firstChild->nodeValue . '<br>';  // TITLE
    $onclick = $nodes->item($i)->childNodes->item(2)->childNodes->item(1)->getAttribute('onclick');
    $parts = explode("'", $onclick);
    echo $parts[1] . '<br>';  // LINK
}