如何从文件中获取所有TITLE和LINKS?
以下文件的示例内容
<tr class="odd">
<td align="left" valign="top" class="text_cont_normal"> TITLE </td>
<td align="left" valign="top" class="normal_text_link">
<img border="0" onclick="javascript:window.location.href='LINK'" style="cursor: pointer;" alt="Download" src="btn.jpg"/></td>
</tr>
<tr class="even">
<td align="left" valign="top" class="text_cont_normal"> TITLE2 </td>
<td align="left" valign="top" class="normal_text_link">
<img border="0" onclick="javascript:window.location.href='LINK2'" style="cursor: pointer;" alt="Download" src="btn.jpg"/></td>
</tr>
我试过
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
if ($tag->hasAttribute('onclick'))
echo $tag->getAttribute('onclick').'<br>';
}
但是没有得到我真正想要的数据!
答案 0 :(得分:1)
像这样,例如
$doc = new DOMDocument();
$doc->loadHTMLFile($filename);
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//td[@class="text_cont_normal"]');
foreach($nodes as $node)
{
echo $node->nodeValue.'<br>'; // title
}
$nodes = $xpath->query('//td[@class="normal_text_link"]/img[@alt="Download"]');
foreach($nodes as $node)
{
if ($node->hasAttribute('onclick'))
echo $node->getAttribute('onclick').'<br>'; //click
}
如果您需要LINK,则重写
if ($node->hasAttribute('onclick'))
{
echo $node->getAttribute('onclick').'<br>'; //click
preg_match('/location\.href=(\'|")(.*?)\\1/i',
$node->getAttribute('onclick'), $matches);
if (isset($matches[2])) echo $matches[2].'<br>'; // the value
}
或者您是否需要分组?
答案 1 :(得分:1)
一种可能的方式:
$nodes = $doc->getElementsByTagName('tr');
$max = $nodes->length;
for ($i = 0; $i < $max; $i++)
{
echo $nodes->item($i)->firstChild->nodeValue . '<br>'; // TITLE
$onclick = $nodes->item($i)->childNodes->item(2)->childNodes->item(1)->getAttribute('onclick');
$parts = explode("'", $onclick);
echo $parts[1] . '<br>'; // LINK
}