PHP DomDocument - getElementByID(部分匹配)如何?

时间:2013-04-27 03:12:11

标签: php domdocument getelementbyid

有没有办法获取ID与部分匹配的所有元素。例如,如果我想抓取网页上的所有HTML元素,其id属性以msg_开头,但可能是之后的任何内容。

这是我到目前为止所做的:

$doc = new DomDocument;

// We need to validate our document before refering to the id
$doc->validateOnParse = true;
$doc->loadHtml(file_get_contents('{URL IS HERE}'));
foreach($doc->getElementById('msg_') as $element) { 
   foreach($element->getElementsByTagName('a') as $link)
   {
      echo $link->nodeValue . "\n";
   }
}

但是我需要弄清楚如何用这个位进行部分id匹配:$doc->getElementById('msg_')或者是否有其他方法可以实现这一点...... ??

基本上,我需要抓取所有'a'标记,这些标记是元素的子元素,其id以msg_开头。技术上总是,只有1 a标记,但我不喜欢我不知道如何抓住第一个孩子,这也是我在这方面也使用foreach的原因。

这是否可以使用DomDocument PHP类?

以下是我现在使用的代码,它也不起作用:

$str = '';
$filename = 'http://dream-portal.net/index.php/board,65.0.html';
@set_time_limit(0);

$fp = fopen($filename, 'rb');
while (!feof($fp))
{
    $str .= fgets($fp, 16384);
}
fclose($fp);

$doc = new DOMDocument();
$doc->loadXML($str);

$selector = new DOMXPath($doc);

$elements = $selector->query('//row[starts-with(@id, "msg_")]');

foreach ($elements as $node) {
    var_dump($node->nodeValue) . PHP_EOL;
}

HTML如下(它位于span标记中):

<td class="subject windowbg2">
<div>
  <span id="msg_6555">
    <a href="http://dream-portal.net/index.php?topic=834.0">Poll 1.0</a>
  </span>
  <p>
    Started by 
    <a href="http://dream-portal.net/index.php?action=profile;u=1" title="View the profile of SoLoGHoST">SoLoGHoST</a>
    <small id="pages6555">
      « 
      <a class="navPages" href="http://dream-portal.net/index.php?topic=834.0">1</a>
      <a class="navPages" href="http://dream-portal.net/index.php?topic=834.15">2</a>
        »
    </small>

                        with 963 Views

  </p>
</div>
</td>

这是<span id="msg_部分,并且有很多这些(HTML页面上至少15个)。

1 个答案:

答案 0 :(得分:4)

使用此:

$str = file_get_contents('http://dream-portal.net/index.php/board,65.0.html');

$doc = new DOMDocument();
@$doc->loadHTML($str);

$selector = new DOMXPath($doc);

foreach ($selector->query('//*[starts-with(@id, "msg_")]') as $node) {
    var_dump($node->nodeValue) . PHP_EOL;
}

给你:

string(8) "Poll 1.0"
string(12) "Shoutbox 2.2"
string(24) "Polaroid Attachments 1.6"
string(24) "Featured News Slider 1.3"
string(17) "Image Resizer 1.0"
string(8) "Blog 2.2"
string(13) "RSS Feeds 1.0"
string(19) "Adspace Manager 1.2"
string(21) "Facebook Like Box 1.0"
string(15) "Price Table 1.0"
string(13) "SMF Links 1.0"
string(19) "Download System 1.2"
string(16) "[*]Site News 1.0"
string(12) "Calendar 1.3"
string(16) "Page Peel Ad 1.1"
string(20) "Sexy Bookmarks 1.0.1"
string(15) "Forum Staff 1.2"
string(21) "Facebook Comments 1.0"
string(15) "Attachments 1.4"
string(25) "YouTube Channels 0.9 Beta"