缩短文本类似于推文而不切断内部链接

时间:2012-01-28 04:44:47

标签: php html

我有一个这样的字符串:

I love @kevinrose 's new website <a href="http://kevinrose.com">Link</a>

我有这个功能:

function short($string, $max = 255) {
    if (strlen($string) >= $max) {
        $string = mb_substr($string, 0, $max - 5, 'utf-8') . '...';
    } return $string;
}

如果我将屏幕切成50,那么它最终会成为:

I love @kevinrose 's new website <a href="http://kevinr...

当然会杀死html。

是否有一种简单的方法可以避免在不破坏HTML的情况下切割href标签(之前或之后)?

我当然需要保留我的标签。

谢谢

2 个答案:

答案 0 :(得分:6)

从PHP看到这一点:截断字符串,同时保留HTML标签和整个单词 - Alan Whipple - &gt; http://alanwhipple.com/2011/05/25/php-truncate-string-preserving-html-tags-words/

<?php
/**
 * truncateHtml can truncate a string up to a number of characters while preserving whole words and HTML tags
 *
 * @param string $text String to truncate.
 * @param integer $length Length of returned string, including ellipsis.
 * @param string $ending Ending to be appended to the trimmed string.
 * @param boolean $exact If false, $text will not be cut mid-word
 * @param boolean $considerHtml If true, HTML tags would be handled correctly
 *
 * @return string Trimmed string.
 */
function truncateHtml($text, $length = 100, $ending = '...', $exact = false, $considerHtml = true) {
    if ($considerHtml) {
        // if the plain text is shorter than the maximum length, return the whole text
        if (strlen(preg_replace('/<.*?>/', '', $text)) <= $length) {
            return $text;
        }
        // splits all html-tags to scanable lines
        preg_match_all('/(<.+?>)?([^<>]*)/s', $text, $lines, PREG_SET_ORDER);
        $total_length = strlen($ending);
        $open_tags = array();
        $truncate = '';
        foreach ($lines as $line_matchings) {
            // if there is any html-tag in this line, handle it and add it (uncounted) to the output
            if (!empty($line_matchings[1])) {
                // if it's an "empty element" with or without xhtml-conform closing slash
                if (preg_match('/^<(\s*.+?\/\s*|\s*(img|br|input|hr|area|base|basefont|col|frame|isindex|link|meta|param)(\s.+?)?)>$/is', $line_matchings[1])) {
                    // do nothing
                // if tag is a closing tag
                } else if (preg_match('/^<\s*\/([^\s]+?)\s*>$/s', $line_matchings[1], $tag_matchings)) {
                    // delete tag from $open_tags list
                    $pos = array_search($tag_matchings[1], $open_tags);
                    if ($pos !== false) {
                    unset($open_tags[$pos]);
                    }
                // if tag is an opening tag
                } else if (preg_match('/^<\s*([^\s>!]+).*?>$/s', $line_matchings[1], $tag_matchings)) {
                    // add tag to the beginning of $open_tags list
                    array_unshift($open_tags, strtolower($tag_matchings[1]));
                }
                // add html-tag to $truncate'd text
                $truncate .= $line_matchings[1];
            }
            // calculate the length of the plain text part of the line; handle entities as one character
            $content_length = strlen(preg_replace('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|[0-9a-f]{1,6};/i', ' ', $line_matchings[2]));
            if ($total_length+$content_length> $length) {
                // the number of characters which are left
                $left = $length - $total_length;
                $entities_length = 0;
                // search for html entities
                if (preg_match_all('/&[0-9a-z]{2,8};|&#[0-9]{1,7};|[0-9a-f]{1,6};/i', $line_matchings[2], $entities, PREG_OFFSET_CAPTURE)) {
                    // calculate the real length of all entities in the legal range
                    foreach ($entities[0] as $entity) {
                        if ($entity[1]+1-$entities_length <= $left) {
                            $left--;
                            $entities_length += strlen($entity[0]);
                        } else {
                            // no more characters left
                            break;
                        }
                    }
                }
                $truncate .= substr($line_matchings[2], 0, $left+$entities_length);
                // maximum lenght is reached, so get off the loop
                break;
            } else {
                $truncate .= $line_matchings[2];
                $total_length += $content_length;
            }
            // if the maximum length is reached, get off the loop
            if($total_length>= $length) {
                break;
            }
        }
    } else {
        if (strlen($text) <= $length) {
            return $text;
        } else {
            $truncate = substr($text, 0, $length - strlen($ending));
        }
    }
    // if the words shouldn't be cut in the middle...
    if (!$exact) {
        // ...search the last occurance of a space...
        $spacepos = strrpos($truncate, ' ');
        if (isset($spacepos)) {
            // ...and cut the text in this position
            $truncate = substr($truncate, 0, $spacepos);
        }
    }
    // add the defined ending to the text
    $truncate .= $ending;
    if($considerHtml) {
        // close all unclosed html-tags
        foreach ($open_tags as $tag) {
            $truncate .= '</' . $tag . '>';
        }
    }
    return $truncate;
}

?>

另见

答案 1 :(得分:1)

这是一个更短的方法。它不会遍历DOM树,但几乎适用于所有情况。

此方法首先从内容中删除所有html标记(因此html标记也不会计入字符串长度)。然后,如果需要截断字符串,它会截断它并重新插入所有html标记。

<?php
function short($string, $max = 255) {
    preg_match_all('/<[^>]+>/', $string, $tags); // Save tag information for later
    $stripped = preg_replace('/<[^>]+>/', '', $string); // Strip html tags

    // Truncate the string if needed
    if (strlen($stripped) > $max) {
        $truncated = mb_substr($stripped, 0, $max, 'utf-8');

        // Insert html tags, if any
        if (sizeof($tags) > 0) {
            $pos = 0;
            foreach ($tags[0] as $tag) {
                $pos += strpos($string, $tag); // Get the position the tag should be inserted at
                $string = substr($string, $pos); // Shift to avoid issues with duplicate tags
                $truncated = substr_replace($truncated, $tag, $pos, 0); // Insert the tag
            }
        }

        $string = $truncated . '&hellip;';
    }

    return $string;
}

echo short('I love @kevinrose\'s new website <a href="http://kevinrose.com">Link</a>. Here is a bit of additional text after the link.<a></a>', 50);