使用PHP DOM想要将所有字符串显示为输出

时间:2016-08-06 05:29:14

标签: php regex dom preg-replace preg-match

  

这里是我在PHP中的$ data变量中的html字符串,以及该字符串   有一些像<140/90 mmHg OR <130/80 mmHg这样的文字不行   当我使用PHP DOMDocument运行此代码时显示,因为当来到的时间小于&amp;更重要的是签署它的问题。

<?php
$data = 'THE CORRECT ANSWER IS C.
<p>Choice A Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s</p>
<p></p>
<p>Choice B Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s</p>
<p>Choice D Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industrys standard dummy text ever since the 1500s</p>
<p></p>
<p>Choice E simply dummy text of the printing and typesetting industry.</p>
<p></p>
<p><br>THIS IS MY MAIN TITLE IN CAPS<br>This my sub title.</p>
<p><br>TEST ABC: Lorem Ipsum is simply dummy text of the printing and typesetting industry.</p>
<p>1) It is a long established fact <140/90 mmHg OR <130/80 mmHg making it look like readable English will uncover many web sites still in their infancy. 
<br><br>2) There are many variations of passages of Lorem Ipsum available. </p>
<p><br>TEST XYZ: Lorem Ipsum has been the industrys standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.</p>
<p><br>TES T TEST: It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.</p>
<p><br>TESTXXX: It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>';
echo boldFormatExplanation($data);
?>
  

另外,我创建了以下PHP函数,它将转换粗体标题   并使用PHP DOMDocument加粗一些单词。

     
      
  1. 标题用粗体:&#34;这是我的主要标题&#34; (标题并不总是相同)
  2.   
  3. 粗体字:TEST ABC:,TEST XYZ:,TES T TEST:,TESTXXX :(这个词总是一样的)
  4.         

    以上2点运作良好,只是缺少我的线路   如上所述在第一个区块中。

<?php
function boldFormatExplanation($data){
    $dom = new DOMDocument('1.0', 'UTF-8');
    $dom->encoding = 'utf-8';
    $dom->substituteEntities = false;
    $dom->preserveWhiteSpace = true;
    $internalErrors = libxml_use_internal_errors(true);// Set error level
    @$dom->loadHTML($data, LIBXML_HTML_NODEFDTD);// Load html
    libxml_use_internal_errors($internalErrors);// Restore error level
    $xpath = new DOMXPath($dom);// Dom xpath
    $title_flag = true;
    foreach($xpath->query('//text()') as $node) {
        $txt = trim($node->nodeValue);
        $p = $node->parentNode;
        if (preg_match("/^\s*(TEST ABC:|TEST XYZ:|TES T TEST:|TESTXXX)(.*)$/s", $node->nodeValue, $matches)) {
            // Put Choice in bold:
            $p->insertBefore($dom->createElement('b', $matches[1]), $node);
            $node->nodeValue = " " . trim($matches[2]);
        } else 
        if (strtoupper($txt) === $txt && $txt !== '') {
            // Put header in bold
            if($title_flag == true){
                $p->insertBefore($dom->createElement('b', $txt), $node);
                $node->nodeValue = "";
                $title_flag = false;
            }
        }
    }
    $domData = $dom->saveHTML();
    $data = htmlspecialchars_decode($domData);
    return $data; 
} ?>

您可以在here运行此代码,同时跳过此行的输出<140/90 mmHg OR <130/80 mmHg

1 个答案:

答案 0 :(得分:1)

您在这里没有选择,您需要在使用DOMDocument::loadHTML加载字符串之前处理该字符串。但你不能像一个盲目替换的野蛮人那样做(因为在这种情况下,<script标签之间的style也会被替换。。您需要使用libxml错误来仅查找有问题的打开尖括号。你可以这样做(它不是很快,因为你需要构建DOM树,直到错误消失,但它是正确的)

define('LIBXML_ERR_NAME_REQUIRED', 68);

$skeleton = '<html><head><meta charset="UTF-8"/></head><body id="root">%s</body></html>';
$htmlDoc = sprintf($skeleton, $data);

$dom = new DOMDocument;

do {
    libxml_use_internal_errors(true);
    $hasError = false;
    $dom->loadHTML($htmlDoc);
    $errors = libxml_get_errors();

    foreach ($errors as $error) {
        if ($error->code == LIBXML_ERR_NAME_REQUIRED) {
            $hasError = true;
            $htmlDoc = preg_replace('~\A(?:.*\R){' . ($error->line - 1) . '}.{' . ($error->column - 2) . '}\K<~u', '&lt;', $htmlDoc);
        }
    }
    libxml_clear_errors();
} while ($hasError);

boldFormatExplanation($dom);

foreach($dom->getElementById('root')->childNodes as $childNode) {
    echo $dom->saveHTML($childNode);
}

顺便说一下,当你使用DOMDocument::loadHTML之后设置DOMDocument编码属性是没用的,因为编码是用文档内容设置的(这是我给自己设置一个html的主要原因) $data周围的骨架<meta charset="UTF-8"/>

关于你的粗体功能,你可以这样写:

function boldFormatExplanation(&$dom) {
    $xpath = new DOMXPath($dom);
    $title_flag = true;

    foreach($xpath->query('//text()') as $node) {
        $txt = trim($node->nodeValue);
        if (empty($txt)) continue;

        $p = $node->parentNode;
        if (preg_match("/^(TEST ABC:|TEST XYZ:|TES T TEST:|TESTXXX)\s*(.*)/s", $txt, $matches)) {
            // Put Choice in bold:
            $p->insertBefore($dom->createElement('b', $matches[1]), $node);
            $node->nodeValue = " " . $matches[2];
        } elseif ($title_flag && strtoupper($txt) === $txt) {
            // Put header in bold
            $p->replaceChild($dom->createElement('b', $txt), $node);
            $title_flag = false;
        }
    }
}