我需要缩短给定的文本(使用不同的编码!) - 例如。到140个字符 - 没有触摸链接。
示例:
Lorem ipsum dolor sit amet: http://bit.ly/111111 Consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat. http://bit.ly/222222 Sed diam voluptua. At vero eos et accusam et justo duo dolores. http://bit.ly/111111
最终应该是:
Lorem ipsum dolor sit amet: http://bit.ly/111111 Consetetur sadipscing elitr, sed diam nonumy... http://bit.ly/222222 http://bit.ly/111111
我带有示例的实际代码在这里:http://phpfiddle.org/lite/code/er7-sty
function shortenMessage($message,$limit=140,$encoding='utf-8') {
if (mb_strlen($message,$encoding) <= $limit) return $message;
echo '<pre><h3>Original message:<br />'.$message.'<hr>';
# search positions of links
$reg_exUrl = "/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
preg_match_all ($reg_exUrl, $message, $links,PREG_OFFSET_CAPTURE);
echo 'Links found:<br />';
var_dump($links[0]);
echo '<hr>';
$position = array();
$len = 0;
# search utf-8 position of links
foreach ($links[0] as $values) {
$url = $values[0];
$offset = $values[1];
#$pos = mb_strpos($message, $url, $offset, $encoding); # doesnt work
$pos = mb_strpos($message, $url, 0, $encoding);
$position[$pos] = $url;
# delete url from string
$message = str_replace($url, '', $message);
$len += mb_strlen($url,$encoding); # sum lenght of urls to cut from maxlenght
}
echo 'UTF-8 Positions:<br />';
var_dump($position);
echo '<hr>';
# shorten text
$maxlenght = $limit - $len - 7; # 7 is a security buffer
while ($maxlenght < 0) { # too many urls? then cut some...
array_shift($position);
$len -= mb_strlen($position[0],$encoding);
$maxlenght = $limit - $len - 6;
}
echo 'UTF-8 Positions shortened:<br />';
var_dump($position);
echo '<hr>';
$message = mb_substr($message,0,$maxlenght,$encoding).'... ';
echo 'Shortened message without urls:<br />';
var_dump($message);
echo '<hr>';
# re-insert urls at right positions
$addpos = 0;
foreach ($position as $pos => $url) {
$pos += $addpos;
if ($pos < mb_strlen($message,$encoding)) {
$message = mb_substr($message,0,$pos,$encoding).$url.mb_substr($message,$pos,mb_strlen($message),$encoding);
} else {
$message .= ' '.$url;
}
$addpos += mb_strlen($url,$encoding);
}
echo 'Shortened message:<br />';
var_dump($message);
echo '<hr>';
return $message;
}
如果文本中只有不同的链接,但是当一个链接重复时失败,则它可以工作。
我已经尝试将preg_match_all中的位置作为mb_strpos的偏移量,但我认为这会失败,因为preg-match-utf8-problem。
我已经看过Shortening text tweet-like without cutting links inside了,但他们没有处理编码并处理html标签......
答案 0 :(得分:0)
认为我找到了一个解决方案 - 也许它可以帮到某个人。当链接使用两次时,我只需将mb_strpos中的最后一个位置作为偏移量 - 所以我对字节数没有问题......
function shortenMessage($message,$limit=140,$encoding='utf-8') {
if (mb_strlen($message,$encoding) <= $limit) return $message;
# search positions of links
$reg_exUrl = "/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
preg_match_all ($reg_exUrl, $message, $links);
$position = array();
$len = 0;
# get position of links depending on encoding
foreach ($links[0] as $url) {
$offset = 0;
$keys = array_keys($position, $url);
if ($keys) { # url was already used - take offset in advance
$lastpos = end($keys);
$offset = $lastpos + 1;
}
$pos = mb_strpos($message, $url, $offset, $encoding);
$position[$pos] = $url;
}
# delete urls from string
foreach ($position as $url) {
$message = str_replace($url, '', $message);
$len += mb_strlen($url,$encoding); # sum lenght of urls to cut from maxlenght
}
# shorten text
$maxlenght = $limit - $len - 7; # 7 is a security buffer
while ($maxlenght < 0) { # too many urls? then cut some...
$key = min(array_keys($position));
$len -= mb_strlen($position[$key],$encoding);
$maxlenght = $limit - $len - 6;
unset($position[$key]);
}
$message = mb_substr($message,0,$maxlenght,$encoding).'... ';
# re-insert urls at right positions
$lasturl = '';
foreach ($position as $pos => $url) {
if ($pos < mb_strlen($message,$encoding)) {
$message = mb_substr($message,0,$pos,$encoding).$url.mb_substr($message,$pos,mb_strlen($message),$encoding);
} elseif ($url != $lasturl) { # avoid adding the same url at the end
$message .= ' '.$url;
}
$lasturl = $url;
}
return $message;
}
答案 1 :(得分:-1)
试试这段代码:
$string = 'Lorem ipsum dolor sit amet: http://bit.ly/111111 Consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat. http://bit.ly/222222 Sed diam voluptua. At vero eos et accusam et justo duo dolores. http://bit.ly/222222';
$regex = '/https?\:\/\/[^\" ]+/i';
preg_match_all($regex, $string, $matches);
print_r($matches[0]);
更新的答案
<?php
$string = 'Lorem ipsum dolor sit amet: http://bit.ly/111111 Consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat. http://bit.ly/222222 Sed diam voluptua. At vero eos et accusam et justo duo dolores. http://bit.ly/222222';
echo "Original String";
echo "<hr>";
echo $string;
$matched_string = preg_split('/https?\:\/\/[^\" ]+/i', $string);
echo "<br />";
echo "<br />";
echo "<br />";
echo "<br />";
echo "Shorten String";
echo "<hr>";
preg_match_all('/(https?\:\/\/[^\" ]+)/i', $string, $matched_url);
$urls = $matched_url[0];
$formatted_str = '';
for($i=0; $i< count($urls); $i++){
if(strlen($matched_string[$i]) > 40){
$formatted_str .= substr($matched_string[$i], 0, 40).'...'.$urls[$i];
} else {
$formatted_str .= $matched_string[$i].$urls[$i];
}
}
echo $formatted_str;
?>
另一种解决方案[使用CSS缩短文字长度]
<?php
$string = 'Lorem ipsum dolor sit amet: http://bit.ly/111111 Consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat. http://bit.ly/222222 Sed diam voluptua. At vero eos et accusam et justo duo dolores. http://bit.ly/222222';
echo "Original String";
echo "<hr>";
echo $string;
echo "<br />";
echo "<br />";
echo "<br />";
echo "<br />";
echo "Shorten String";
echo "<hr>";
$formatted_str = preg_replace('/(https?\:\/\/[^\" ]+)/i', "</span><span>$1</span></div><div><span class=\"shorten\">", $string);
?>
<html>
<head>
<style type="text/css">
.shorten{
background-color: #f00;
text-overflow: ellipsis;
width:300px;
overflow: hidden;
white-space:nowrap;
float: left;
}
span{float: left}
</style>
</head>
<body>
<div><span class="shorten"><?php echo $formatted_str; ?></span></div>
</body>
</html>