我有一个包含HTML标签的字符串。我正在寻找一段代码,可以让我将这个字符串截断为:
<img />
)。例如,字符串是:
<img>Something</img><b>Just an Example</b> Plain Text <br><a href="#">stackoverflow</a>
所以结果应该是:
只是一个示例纯文本stackoverflow(它是一个链接)。
结果我们有大约35个单词(白色空间除外)。
我尝试了来自this question的解决方案,但未获得必需的结果。任何帮助将不胜感激。
答案 0 :(得分:5)
一个功能怎么样?这是我的 - AbstractHTMLContents
。它有两个参数:
以下是代码:
function AbstractHTMLContents($html, $maxLength=100){
mb_internal_encoding("UTF-8");
$printedLength = 0;
$position = 0;
$tags = array();
$newContent = '';
$html = $content = preg_replace("/<img[^>]+\>/i", "", $html);
while ($printedLength < $maxLength && preg_match('{</?([a-z]+)[^>]*>|&#?[a-zA-Z0-9]+;}', $html, $match, PREG_OFFSET_CAPTURE, $position))
{
list($tag, $tagPosition) = $match[0];
// Print text leading up to the tag.
$str = mb_strcut($html, $position, $tagPosition - $position);
if ($printedLength + mb_strlen($str) > $maxLength){
$newstr = mb_strcut($str, 0, $maxLength - $printedLength);
$newstr = preg_replace('~\s+\S+$~', '', $newstr);
$newContent .= $newstr;
$printedLength = $maxLength;
break;
}
$newContent .= $str;
$printedLength += mb_strlen($str);
if ($tag[0] == '&') {
// Handle the entity.
$newContent .= $tag;
$printedLength++;
} else {
// Handle the tag.
$tagName = $match[1][0];
if ($tag[1] == '/') {
// This is a closing tag.
$openingTag = array_pop($tags);
assert($openingTag == $tagName); // check that tags are properly nested.
$newContent .= $tag;
} else if ($tag[mb_strlen($tag) - 2] == '/'){
// Self-closing tag.
$newContent .= $tag;
} else {
// Opening tag.
$newContent .= $tag;
$tags[] = $tagName;
}
}
// Continue after the tag.
$position = $tagPosition + mb_strlen($tag);
}
// Print any remaining text.
if ($printedLength < $maxLength && $position < mb_strlen($html))
{
$newstr = mb_strcut($html, $position, $maxLength - $printedLength);
$newstr = preg_replace('~\s+\S+$~', '', $newstr);
$newContent .= $newstr;
}
// Close any open tags.
while (!empty($tags))
{
$newContent .= sprintf('</%s>', array_pop($tags));
}
return $newContent;
}
看起来,它给出了你期望的结果。