我正在尝试在PHP中使用DOMDocument来添加/解析HTML文档中的内容。从我可以阅读的内容来看,将formOutput设置为true并将preserveWhiteSpace设置为false应该保持选项卡和换行符的顺序,但它似乎不适用于新创建或附加的节点。
以下是代码:
$dom = new \DOMDocument;
$dom->formatOutput = true;
$dom->preserveWhiteSpace = false;
$dom->loadHTMLFile($htmlsource);
$tables = $dom->getElementsByTagName('table');
foreach($tables as $table)
{
$table->setAttribute('class', 'tborder');
$div = $dom->createElement('div');
$div->setAttribute('class', 'm2x');
$table->parentNode->insertBefore($div, $table);
$div->appendChild($table);
}
$dom->saveHTMLFile($html)
这是HTML的样子:
<table>
<tr>
<td></td>
</tr>
</table>
这就是我想要的:
<div class="m2x">
<table class="tborder">
<tr>
<td></td>
</tr>
</table>
</div>
这是我得到的:
<div class="m2x"><table class="tborder"><tr>
<td></td>
</tr></table></div>
我有什么问题吗?我已经尝试使用谷歌搜索这么多不同的方式,因为我没有运气。
答案 0 :(得分:2)
不幸的是,您可能需要编写一个函数来缩小输出的方式。我做了一些你可能会觉得有用的功能。
function indentContent($content, $tab="\t")
{
// add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
$content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content);
// now indent the tags
$token = strtok($content, "\n");
$result = ''; // holds formatted version as it is built
$pad = 0; // initial indent
$matches = array(); // returns from preg_matches()
// scan each line and adjust indent based on opening/closing tags
while ($token !== false)
{
$token = trim($token);
// test for the various tag states
// 1. open and closing tags on same line - no change
if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) $indent=0;
// 2. closing tag - outdent now
elseif (preg_match('/^<\/\w/', $token, $matches))
{
$pad--;
if($indent>0) $indent=0;
}
// 3. opening tag - don't pad this one, only subsequent tags
elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)) $indent=1;
// 4. no indentation needed
else $indent = 0;
// pad the line with the required number of leading spaces
$line = str_pad($token, strlen($token)+$pad, $tab, STR_PAD_LEFT);
$result .= $line."\n"; // add to the cumulative result, with linefeed
$token = strtok("\n"); // get the next token
$pad += $indent; // update the pad size for subsequent lines
}
return $result;
}
indentContent($dom->saveHTML())
将返回:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>
<div class="m2x">
<table class="tborder">
<tr>
<td>
</td>
</tr>
</table>
</div>
</body>
</html>
我从this one开始创建此功能。
答案 1 :(得分:1)
我修改了ghbarratt所写的伟大函数,因此它不会缩进void elements。
function indentContent($content, $tab="\t")
{
// add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
$content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content);
// now indent the tags
$token = strtok($content, "\n");
$result = ''; // holds formatted version as it is built
$pad = 0; // initial indent
$matches = array(); // returns from preg_matches()
// scan each line and adjust indent based on opening/closing tags
while ($token !== false)
{
$token = trim($token);
// test for the various tag states
// 1. open and closing tags on same line - no change
if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) $indent=0;
// 2. closing tag - outdent now
elseif (preg_match('/^<\/\w/', $token, $matches))
{
$pad--;
if($indent>0) $indent=0;
}
// 3. opening tag - don't pad this one, only subsequent tags (only if it isn't a void tag)
elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches))
{
$voidTag = false;
foreach ($matches as $m)
{
// Void elements according to http://www.htmlandcsswebdesign.com/articles/voidel.php
if (preg_match('/^<(area|base|br|col|command|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)/im', $m))
{
$voidTag = true;
break;
}
}
if (!$voidTag) $indent=1;
}
// 4. no indentation needed
else $indent = 0;
// pad the line with the required number of leading spaces
$line = str_pad($token, strlen($token)+$pad, $tab, STR_PAD_LEFT);
$result .= $line."\n"; // add to the cumulative result, with linefeed
$token = strtok("\n"); // get the next token
$pad += $indent; // update the pad size for subsequent lines
}
return $result;
}
所有学分都转到ghbarratt。
答案 2 :(得分:0)
@Stan和@ghbarrat都无法使用<!DOCTYPE html>
html5声明。这种缩进传递给<head>
元素。
预期:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
<!-- all good -->
</body>
</html>
结果:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
<!-- all good -->
</body>
</html>
进行一点测试,发现我在将<html>
元素添加到Void元素列表中时得到了部分修复,但是并不能解决head的问题,还可以使子级(即head和body)变平。
编辑#1
看来<meta charset="UTF-8">
可能是造成缩进错误的原因。
编辑#2 -解决方案
经过很少的故障排除后,我发现<meta>
作为自动关闭标签会影响下一个关闭标签,这可以通过添加标记来解决。该标记定义了是否找到了自动关闭标签,然后关闭标签的下一个实例将具有额外的负缩进。
function indentContent($content, $tab="\t"){
// add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
$content = preg_replace('/(>)(<)(\/*)/', "$1\n$2$3", $content);
// now indent the tags
$token = strtok($content, "\n");
$result = ''; // holds formatted version as it is built
$pad = 0; // initial indent
$matches = array(); // returns from preg_matches()
// scan each line and adjust indent based on opening/closing tags
while ($token !== false && strlen($token)>0)
{
$token = trim($token);
// test for the various tag states
// 1. open and closing tags on same line - no change
if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)) $indent=0;
// 2. closing tag - outdent now
elseif (preg_match('/^<\/\w/', $token, $matches))
{
$pad--;
if($indent>0) $indent=0;
if($nextTagNegative){
$pad--;$nextTagNegative=false;
}
}
// 3. opening tag - don't pad this one, only subsequent tags (only if it isn't a void tag)
elseif (preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches))
{
$voidTag = false;
foreach ($matches as $m)
{
// Void elements according to http://www.htmlandcsswebdesign.com/articles/voidel.php
if (preg_match('/^<(area|base|br|col|command|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)/im', $m))
{
$voidTag = true;
break;
}
}
if (!$voidTag) $indent=1;$nextTagNegative=true;
}
// 4. no indentation needed
else $indent = 0;
// pad the line with the required number of leading spaces
$line = str_pad($token, strlen($token)+$pad, $tab, STR_PAD_LEFT);
$result .= $line."\n"; // add to the cumulative result, with linefeed
$token = strtok("\n"); // get the next token
$pad += $indent; // update the pad size for subsequent lines
}
return $result;
}