从DOM获取div数据,该数据从特定的id名称开始

时间:2015-05-09 08:59:53

标签: php html dom curl

我正在尝试获取html div数据,其id从特定名称或字符串开始。

例如,假设我有这个html数据: -

<html>
  <div id="post_message_1">
      somecontent1
  </div>
 <div id="post_message_2">
      somecontent2
  </div>
    <div id="post_message_3">
      somecontent3
  </div>
 </html>

为此,我尝试了卷曲。

        <?php
        function file_get_contents_curl($url)
        {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_HEADER, 0);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        $data = curl_exec($ch);
        curl_close($ch);
        return $data;
        }


        $html = file_get_contents_curl("myUrl");
        $fh = fopen("test.html", 'w'); // we create the file, notice the 'w'. This is to be able to write to the file once.
        //writing response in newly created file
        fwrite($fh, $html); // here we write the data to the file.
        fclose($fh);                    
        ?>

如果我使用

  $select=  $doc->getElementById("post_message_");

然后它没有返回数据,因为它在DOM中搜索这个id,但是在html div id中只从这个字符串开始。它可能是post_message_1或post_message_2。

3 个答案:

答案 0 :(得分:0)

我可能会迭代所有div并在他们的id上使用正则表达式来获取我需要的那些。

除非您可以编辑html页面代码并将类添加到包含消息的div中,否则我认为没有更简洁的方法。

答案 1 :(得分:0)

我将file_get_contents_curl的输出转换为 SimpleXmlElement对象,我使用了xpath

的其中一项功能

您可以这样做:

$html = <<<HTML
<html>
  <div id="post_message_1">
      somecontent1
  </div>
 <div id="post_message_2">
      somecontent2
  </div>
    <div id="post_message_3">
      somecontent3
  </div>
 </html>
HTML;

$dom = new SimpleXMLElement($html);

var_dump($dom->xpath('//div[starts-with(@id, "post_message_")]'));

<强>更新

在你的情况下,你应该做这样的事情:

$doc = new DOMDocument();
$doc->loadHTML(file_get_contents_curl($url));

$sxml = simplexml_import_dom($doc);

var_dump($sxml->xpath('//div[starts-with(@id, "post_message_")]'));

答案 2 :(得分:0)

我找到了解决方案,它的工作正常。可能这段代码会帮助别人。感谢@smarber,他的模式帮助我解决了这个问题。

    <?php
        function file_get_contents_curl($url)
        {
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_HEADER, 0);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        $data = curl_exec($ch);
        curl_close($ch);
        return $data;
        }


        $html = file_get_contents_curl("myUrl");
        $dom    = new DOMDocument();
   $result = $dom->loadHTML($html);
    $finder = new DomXPath($dom);
      $nodes = $finder->query('//div[starts-with(@id, "post_message_")]');

       $tmp_dom = new DOMDocument(); 
   foreach ($nodes as $node) 
   {
$tmp_dom->appendChild($tmp_dom->importNode($node,true));
   }

       $innerHTML = trim($tmp_dom->saveHTML()); 
        $fh = fopen("test.html", 'w'); // we create the file, notice the 'w'. This is to be able to write to the file once.
        //writing response in newly created file
        fwrite($fh, $innerHTML); // here we write the data to the file.
        fclose($fh);                    
        ?>