我有一个这样的字符串:
I love @kevinrose 's new website <a href="http://kevinrose.com">Link</a>
我有这个功能:
function short($string, $max = 255) {
if (strlen($string) >= $max) {
$string = mb_substr($string, 0, $max - 5, 'utf-8') . '...';
} return $string;
}
如果我将屏幕切成50,那么它最终会成为:
I love @kevinrose 's new website <a href="http://kevinr...
当然会杀死html。
是否有一种简单的方法可以避免在不破坏HTML的情况下切割href标签(之前或之后)?
我当然需要保留我的标签。
谢谢
答案 0 :(得分:6)
从PHP看到这一点:截断字符串,同时保留HTML标签和整个单词 - Alan Whipple - &gt; http://alanwhipple.com/2011/05/25/php-truncate-string-preserving-html-tags-words/
<?php
/**
* truncateHtml can truncate a string up to a number of characters while preserving whole words and HTML tags
*
* @param string $text String to truncate.
* @param integer $length Length of returned string, including ellipsis.
* @param string $ending Ending to be appended to the trimmed string.
* @param boolean $exact If false, $text will not be cut mid-word
* @param boolean $considerHtml If true, HTML tags would be handled correctly
*
* @return string Trimmed string.
*/
function truncateHtml($text, $length = 100, $ending = '...', $exact = false, $considerHtml = true) {
if ($considerHtml) {
// if the plain text is shorter than the maximum length, return the whole text
if (strlen(preg_replace('/<.*?>/', '', $text)) <= $length) {
return $text;
}
// splits all html-tags to scanable lines
preg_match_all('/(<.+?>)?([^<>]*)/s', $text, $lines, PREG_SET_ORDER);
$total_length = strlen($ending);
$open_tags = array();
$truncate = '';
foreach ($lines as $line_matchings) {
// if there is any html-tag in this line, handle it and add it (uncounted) to the output
if (!empty($line_matchings[1])) {
// if it's an "empty element" with or without xhtml-conform closing slash
if (preg_match('/^<(\s*.+?\/\s*|\s*(img|br|input|hr|area|base|basefont|col|frame|isindex|link|meta|param)(\s.+?)?)>$/is', $line_matchings[1])) {
// do nothing
// if tag is a closing tag
} else if (preg_match('/^<\s*\/([^\s]+?)\s*>$/s', $line_matchings[1], $tag_matchings)) {
// delete tag from $open_tags list
$pos = array_search($tag_matchings[1], $open_tags);
if ($pos !== false) {
unset($open_tags[$pos]);
}
// if tag is an opening tag
} else if (preg_match('/^<\s*([^\s>!]+).*?>$/s', $line_matchings[1], $tag_matchings)) {
// add tag to the beginning of $open_tags list
array_unshift($open_tags, strtolower($tag_matchings[1]));
}
// add html-tag to $truncate'd text
$truncate .= $line_matchings[1];
}
// calculate the length of the plain text part of the line; handle entities as one character
$content_length = strlen(preg_replace('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|[0-9a-f]{1,6};/i', ' ', $line_matchings[2]));
if ($total_length+$content_length> $length) {
// the number of characters which are left
$left = $length - $total_length;
$entities_length = 0;
// search for html entities
if (preg_match_all('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|[0-9a-f]{1,6};/i', $line_matchings[2], $entities, PREG_OFFSET_CAPTURE)) {
// calculate the real length of all entities in the legal range
foreach ($entities[0] as $entity) {
if ($entity[1]+1-$entities_length <= $left) {
$left--;
$entities_length += strlen($entity[0]);
} else {
// no more characters left
break;
}
}
}
$truncate .= substr($line_matchings[2], 0, $left+$entities_length);
// maximum lenght is reached, so get off the loop
break;
} else {
$truncate .= $line_matchings[2];
$total_length += $content_length;
}
// if the maximum length is reached, get off the loop
if($total_length>= $length) {
break;
}
}
} else {
if (strlen($text) <= $length) {
return $text;
} else {
$truncate = substr($text, 0, $length - strlen($ending));
}
}
// if the words shouldn't be cut in the middle...
if (!$exact) {
// ...search the last occurance of a space...
$spacepos = strrpos($truncate, ' ');
if (isset($spacepos)) {
// ...and cut the text in this position
$truncate = substr($truncate, 0, $spacepos);
}
}
// add the defined ending to the text
$truncate .= $ending;
if($considerHtml) {
// close all unclosed html-tags
foreach ($open_tags as $tag) {
$truncate .= '</' . $tag . '>';
}
}
return $truncate;
}
?>
另见
答案 1 :(得分:1)
这是一个更短的方法。它不会遍历DOM树,但几乎适用于所有情况。
此方法首先从内容中删除所有html标记(因此html标记也不会计入字符串长度)。然后,如果需要截断字符串,它会截断它并重新插入所有html标记。
<?php
function short($string, $max = 255) {
preg_match_all('/<[^>]+>/', $string, $tags); // Save tag information for later
$stripped = preg_replace('/<[^>]+>/', '', $string); // Strip html tags
// Truncate the string if needed
if (strlen($stripped) > $max) {
$truncated = mb_substr($stripped, 0, $max, 'utf-8');
// Insert html tags, if any
if (sizeof($tags) > 0) {
$pos = 0;
foreach ($tags[0] as $tag) {
$pos += strpos($string, $tag); // Get the position the tag should be inserted at
$string = substr($string, $pos); // Shift to avoid issues with duplicate tags
$truncated = substr_replace($truncated, $tag, $pos, 0); // Insert the tag
}
}
$string = $truncated . '…';
}
return $string;
}
echo short('I love @kevinrose\'s new website <a href="http://kevinrose.com">Link</a>. Here is a bit of additional text after the link.<a></a>', 50);