PHP DOM Scraping返回空元素,“<>”

时间:2014-09-18 14:28:33

标签: php html dom

我在文件夹中附带了index.php以及文章(作为php文件),这样当用户点击我的/财务网址时,它会显示我所有的财务文章。它可以工作但是为文件夹中的每篇文章(.php文件)返回一个空元素<>。拉出我的头发。下面是index.php(问题代码)和示例文章文件的样子。这是实际的网址https://www.stndip.com/finance/。 提前感谢您的帮助。

* index.php:*

<!DOCTYPE html>
<html>
<?php include('../includes/head.php'); ?>
    <body>
        <?php include('../includes/header.php'); ?>
        <div id="push"></div>
        <div id="main">
            <?php
            function innerHTML($node) {
                    $doc  = $node->ownerDocument;
                    $frag = $doc->createDocumentFragment();
                    foreach ($node->childNodes as $child){
                        if ($child->nodeValue !== ""){
                            $frag->appendChild($child->cloneNode(TRUE));
                        }
                    }
                    return $doc->saveHTML($frag);
                    }
            $filename = glob("*.php");
            $filename = array_diff($filename, array('index.php',));

            foreach ($filename as &$value) {
                $doc = new DOMDocument();
                $doc->loadHTMLFile($value);
                $article = $doc->getElementById('article');

                echo innerHTML($article);
                }
            ?>
        </div>
        <?php include('../includes/footer.php'); ?>
    <div id="alertWindow"></div>
    </body>
</html>

*示例文章:*

<!DOCTYPE html>
<html>
<?php include('../includes/head.php'); ?>
    <body>
        <?php include('../includes/header.php'); ?>
        <div id="push"></div>
        <div id="main">
            <div id="article">
                <div class="content">
                    <div class="heading">
                        <strong>Powers that Be</strong>
                        <span class="author">By Ron Royston</span>
                        <span class="date">July 16, 2014</span>
                    </div>
                <iframe width="425" height="349" src="//www.youtube.com/embed/1MA06RHA-zI" frameborder="0" allowfullscreen></iframe>           
                <p>Maverick market operator Hugh Hendry is delightful.</p>
                </div>
            </div>
        </div>
        <?php include('../includes/footer.php'); ?> 
        <div id="alertWindow"></div>
    </body>
</html>

1 个答案:

答案 0 :(得分:0)

下面是取消&#34;&lt;&gt;&#34;的内部HTML功能。

        function innerHTML(DOMNode $node) { 
          $innerHTML = ''; 
          $children  = $node->childNodes;
        foreach($children as $child) { 
            $innerHTML .= $node->ownerDocument->saveHTML($child);
        }
          return $innerHTML;
        }