DOM Parser突出显示不起作用的关键字

时间:2012-02-17 21:30:36

标签: php html dom highlighting

这个问题与我提出的问题before有关,但由于该主题现已结束,我需要进一步提问,我将通过希望没问题开始一个新问题。

在我之前的回答中,我简化了问题并导致了简单但不完全正常的解决方案。这些天我实现我的代码时意识到了这一点。

上一篇文章中的解决方案存在的问题是HTML标记被替换函数破坏了。我已阅读本网站的许多帖子,我需要使用DOM Parser。我对此非常不熟悉,我在这个post中尝试了用户“ircmaxell”建议的代码,但它对我不起作用。

以下是我所做的示例:

echo '<style type="text/css">
       .ht{
         background-color: yellow;
       }
     </style>'; 


/* taken from user ircmaxell at https://stackoverflow.com/questions/4081372/highlight-keywords-in-a-paragraph

I just modified line $highlight->setAttribute('class', 'highlight') to $highlight->setAttribute('class', 'ht') and commented the first 2 lines   */

function highlight_paragraph($string, $keyword) {
  //$string = '<p>foo<b>bar</b></p>';
  //$keyword = 'foo';
  $dom = new DomDocument();
  $dom->loadHtml($string);
  $xpath = new DomXpath($dom);
  $elements = $xpath->query('//*[contains(.,"'.$keyword.'")]');
  foreach ($elements as $element) {
   foreach ($element->childNodes as $child) {
     if (!$child instanceof DomText) continue;
     $fragment = $dom->createDocumentFragment();
     $text = $child->textContent;
     $stubs = array();
     while (($pos = stripos($text, $keyword)) !== false) {
       $fragment->appendChild(new DomText(substr($text, 0, $pos)));
       $word = substr($text, $pos, strlen($keyword));
       $highlight = $dom->createElement('span');
       $highlight->appendChild(new DomText($word));
       $highlight->setAttribute('class', 'ht');
       $fragment->appendChild($highlight);
       $text = substr($text, $pos + strlen($keyword));
     }
     if (!empty($text)) $fragment->appendChild(new DomText($text));
     $element->replaceChild($fragment, $child);
   }
 }
 $string = $dom->saveXml($dom->getElementsByTagName('body')->item(0)->firstChild);
 return $string;
}


$string = '<p>This book has been written against a background of both reckless optimism and reckless despair.</p>
<p>It holds that Progress and Doom are two sides of the same medal; that both are articles of superstition, not of faith. It was written out of the conviction that it should be possible to discover the hidden mechanics by which all traditional elements of our political and spiritual world were dissolved into a conglomeration where everything seems to have lost specific value, and has become unrecognizable for human comprehension, unusable for human purpose.</p>
<p> Hannah Arendt, The Origins of Totalitarianism (New York: Harcourt Brace Jovanovich, Inc., 1973 ed.), p.vii, Preface to the First Edition.</p>';

$keywords = array('This', 'book', 'has', 'been', 'written', 'background', 'reckless', 'optimism', 'despair.', 'holds', 'Progress', 'Doom ', 'two', 'sides', 'medal;', 'articles', 'superstition,', 'faith.', 'lost', 'Arendt,', 'Totalitarianism');

foreach ($keywords as $kw) {
  $string = highlight_paragraph($string, $kw);
}

echo $string;

echo $ string只返回:

This book has been written against a background of both reckless optimism and reckless despair.

只有前两个单词“This”和“book”才会突出显示。

通常它应该输出所有初始字符串并突出显示关键字。

我在stackoverflow和google中搜索了很多,并且没有找到一个易于使用的代码来实现我的目的,即使有很多人以前曾经问过同样的事情。

我真的需要一个帮助。提前致谢!

2 个答案:

答案 0 :(得分:7)

当我看到这个问题时,你很幸运我非常无聊。 ;)

您作为答案收到的代码似乎没有经过测试 - 我不知道它可能如何正常工作。无论如何,我修复了所有问题,并为您提供了一个工作版本 - 在我本地安装的Apache服务器上使用PHP 5.3测试:

function highlight_paragraph($string, $keyword) {
  $dom = new DOMDocument();
  $dom->loadHtml($string);

  // Search for all text blocks containing the keyword
  $xpath = new DOMXpath($dom);
  $textNodes = $xpath->query('//*[contains(.,"'.$keyword.'")]/text()');

  foreach ($textNodes as $textNode) {
    $fragment = $dom->createDocumentFragment();
    $text = $textNode->nodeValue;
    $stubs = array();

    while (($pos = stripos($text, $keyword)) !== false) {
      $fragment->appendChild(new DOMText(substr($text, 0, $pos)));
      $word = substr($text, $pos, strlen($keyword));

      $highlight = $dom->createElement('span');
      $highlight->appendChild(new DOMText($word));
      $highlight->setAttribute('class', 'ht');
      $fragment->appendChild($highlight);

      $text = substr($text, $pos + strlen($keyword));
    }

    if (!empty($text))
      $fragment->appendChild(new DOMText($text));

    $textNode->parentNode->replaceChild($fragment, $textNode);
 }

 return $dom->saveHTML();
}

答案 1 :(得分:0)

上述解决方案无效。.这是一个非常hacky但可靠的解决方法,可避免突出显示和破坏html。

function highlight_fancy($string, $keywords=array()) {
    $dom = new DOMDocument();
    $dom->loadHtml($string);

    // Search for all text blocks containing the keyword
    $xpath = new DOMXpath($dom);
    foreach($keywords as $keyword){
        $textNodes = $xpath->query('//*[contains(.,"'.$keyword.'")]/text()');

        foreach ($textNodes as $textNode) {
            $fragment = $dom->createDocumentFragment();
            $text = $textNode->nodeValue;
            $stubs = array();

            while (($pos = stripos($text, $keyword)) !== false) {
                $fragment->appendChild(new DOMText(substr($text, 0, $pos)));
                $word = substr($text, $pos, strlen($keyword));

                $highlight = $dom->createElement('span');
                $highlight->appendChild(new DOMText($word));
                $highlight->setAttribute('class', 'hl');
                $fragment->appendChild($highlight);

                $text = substr($text, $pos + strlen($keyword));
            }

            if (!empty($text))
                $fragment->appendChild(new DOMText($text));

            $textNode->parentNode->replaceChild($fragment, $textNode);
        }
    }
    $html= $dom->saveHTML();
    $e=explode("<body><p>",$html);
    $e=explode("</p></body>",$e[1]);
    return $e[0];
}