使用PHP的DomDocument appendChild时保持换行符

时间:2011-10-20 16:11:12

标签: php domdocument

我正在尝试在PHP中使用DOMDocument来添加/解析HTML文档中的内容。从我可以阅读的内容来看,将formOutput设置为true并将preserveWhiteSpace设置为false应该保持选项卡和换行符的顺序,但它似乎不适用于新创建或附加的节点。

以下是代码:

$dom = new \DOMDocument;
$dom->formatOutput = true;
$dom->preserveWhiteSpace = false;
$dom->loadHTMLFile($htmlsource);
$tables = $dom->getElementsByTagName('table');
foreach($tables as $table)
{
    $table->setAttribute('class', 'tborder');
    $div = $dom->createElement('div');
    $div->setAttribute('class', 'm2x');
    $table->parentNode->insertBefore($div, $table);
    $div->appendChild($table);
}
$dom->saveHTMLFile($html)

这是HTML的样子:

<table>
    <tr>
        <td></td>
    </tr>
</table>

这就是我想要的:

<div class="m2x">
    <table class="tborder">
        <tr>
            <td></td>
        </tr>
    </table>
</div>

这是我得到的:

<div class="m2x"><table class="tborder"><tr>
<td></td>
        </tr></table></div>

我有什么问题吗?我已经尝试使用谷歌搜索这么多不同的方式,因为我没有运气。

3 个答案:

答案 0 :(得分:2)

不幸的是,您可能需要编写一个函数来缩小输出的方式。我做了一些你可能会觉得有用的功能。

function indentContent($content, $tab="\t")
{               

        // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
        $content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content);

        // now indent the tags
        $token = strtok($content, "\n");
        $result = ''; // holds formatted version as it is built
        $pad = 0; // initial indent
        $matches = array(); // returns from preg_matches()

        // scan each line and adjust indent based on opening/closing tags
        while ($token !== false) 
        {
                $token = trim($token);
                // test for the various tag states

                // 1. open and closing tags on same line - no change
                if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) $indent=0;
                // 2. closing tag - outdent now
                elseif (preg_match('/^<\/\w/', $token, $matches))
                {
                        $pad--;
                        if($indent>0) $indent=0;
                }
                // 3. opening tag - don't pad this one, only subsequent tags
                elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)) $indent=1;
                // 4. no indentation needed
                else $indent = 0;

                // pad the line with the required number of leading spaces
                $line = str_pad($token, strlen($token)+$pad, $tab, STR_PAD_LEFT);
                $result .= $line."\n"; // add to the cumulative result, with linefeed
                $token = strtok("\n"); // get the next token
                $pad += $indent; // update the pad size for subsequent lines    
        }       

        return $result;
}

indentContent($dom->saveHTML())将返回:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
    <body>
        <div class="m2x">
            <table class="tborder">
                <tr>
                    <td>
                    </td>
                </tr>
            </table>
        </div>
    </body>
</html>

我从this one开始创建此功能。

答案 1 :(得分:1)

我修改了ghbarratt所写的伟大函数,因此它不会缩进void elements

function indentContent($content, $tab="\t")
{
    // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
    $content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content);

    // now indent the tags
    $token = strtok($content, "\n");
    $result = ''; // holds formatted version as it is built
    $pad = 0; // initial indent
    $matches = array(); // returns from preg_matches()

    // scan each line and adjust indent based on opening/closing tags
    while ($token !== false) 
    {
        $token = trim($token);
        // test for the various tag states

        // 1. open and closing tags on same line - no change
        if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) $indent=0;
        // 2. closing tag - outdent now
        elseif (preg_match('/^<\/\w/', $token, $matches))
        {
            $pad--;
            if($indent>0) $indent=0;
        }
        // 3. opening tag - don't pad this one, only subsequent tags (only if it isn't a void tag)
        elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches))
        {
            $voidTag = false;
            foreach ($matches as $m)
            {
                // Void elements according to http://www.htmlandcsswebdesign.com/articles/voidel.php
                if (preg_match('/^<(area|base|br|col|command|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)/im', $m))
                {
                    $voidTag = true;
                    break;
                }
            }

            if (!$voidTag) $indent=1;
        }
        // 4. no indentation needed
        else $indent = 0;

        // pad the line with the required number of leading spaces
        $line = str_pad($token, strlen($token)+$pad, $tab, STR_PAD_LEFT);
        $result .= $line."\n"; // add to the cumulative result, with linefeed
        $token = strtok("\n"); // get the next token
        $pad += $indent; // update the pad size for subsequent lines    
    }    

    return $result;
}

所有学分都转到ghbarratt。

答案 2 :(得分:0)

@Stan和@ghbarrat都无法使用<!DOCTYPE html> html5声明。这种缩进传递给<head>元素。

预期:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
  </head>
  <body>
    <!-- all good -->
  </body>
</html>

结果:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    </head>
    <body>
      <!-- all good -->
    </body>
  </html>

进行一点测试,发现我在将<html>元素添加到Void元素列表中时得到了部分修复,但是并不能解决head的问题,还可以使子级(即head和body)变平。

编辑#1 看来<meta charset="UTF-8">可能是造成缩进错误的原因。

编辑#2 -解决方案

经过很少的故障排除后,我发现<meta>作为自动关闭标签会影响下一个关闭标签,这可以通过添加标记来解决。该标记定义了是否找到了自动关闭标签,然后关闭标签的下一个实例将具有额外的负缩进。

function indentContent($content, $tab="\t"){
    // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
    $content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content);

    // now indent the tags
    $token = strtok($content, "\n");
    $result = ''; // holds formatted version as it is built
    $pad = 0; // initial indent
    $matches = array(); // returns from preg_matches()

    // scan each line and adjust indent based on opening/closing tags
    while ($token !== false && strlen($token)>0)
    {
        $token = trim($token);
        // test for the various tag states

        // 1. open and closing tags on same line - no change
        if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) $indent=0;
        // 2. closing tag - outdent now
        elseif (preg_match('/^<\/\w/', $token, $matches))
        {
            $pad--;
            if($indent>0) $indent=0;
            if($nextTagNegative){
                $pad--;$nextTagNegative=false;
            }
        }
        // 3. opening tag - don't pad this one, only subsequent tags (only if it isn't a void tag)
        elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches))
        {
            $voidTag = false;
            foreach ($matches as $m)
            {
                // Void elements according to http://www.htmlandcsswebdesign.com/articles/voidel.php
                if (preg_match('/^<(area|base|br|col|command|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)/im', $m))
                {
                    $voidTag = true;
                    break;
                }
            }

            if (!$voidTag) $indent=1;$nextTagNegative=true;
        }
        // 4. no indentation needed
        else $indent = 0;

        // pad the line with the required number of leading spaces
        $line = str_pad($token, strlen($token)+$pad, $tab, STR_PAD_LEFT);
        $result .= $line."\n"; // add to the cumulative result, with linefeed
        $token = strtok("\n"); // get the next token
        $pad += $indent; // update the pad size for subsequent lines
    }

    return $result;
}