我正在尝试在字符串过滤器中使用此方法:
public function truncate($string, $chars = 50, $terminator = ' …');
我希望这个
$in = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWYXZ1234567890";
$out = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV …";
还有这个
$in = "âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝ";
$out = "âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ …";
即$chars
减去$terminator
字符串的字符。
此外,过滤器应该在$chars
限制以下的第一个字边界切割,例如
$in = "Answer to the Ultimate Question of Life, the Universe, and Everything.";
$out = "Answer to the Ultimate Question of Life, the …";
我很确定这应该适用于这些步骤
但是,我现在尝试了str*
和mb_*
函数的各种组合,但都产生了错误的结果。这不是那么困难,所以我显然缺少一些东西。有人会分享这个或的工作实现,将我指向一个我最终可以理解如何去做的资源。
由于
P.S。是的,我之前检查了https://stackoverflow.com/search?q=truncate+string+php:)
答案 0 :(得分:5)
答案 1 :(得分:3)
试试这个:
function truncate($string, $chars = 50, $terminator = ' …') {
$cutPos = $chars - mb_strlen($terminator);
$boundaryPos = mb_strrpos(mb_substr($string, 0, mb_strpos($string, ' ', $cutPos)), ' ');
return mb_substr($string, 0, $boundaryPos === false ? $cutPos : $boundaryPos) . $terminator;
}
但您需要确保正确设置内部编码。
答案 2 :(得分:0)
我通常不喜欢只编写这样一个问题的完整答案。但是我也醒了,我想也许你的问题可以让我心情愉快去参加今天余下的活动。
我没有试图运行它,但它应该可以工作,或至少让你在那里90%。
function truncate( $string, $chars = 50, $terminate = ' ...' )
{
$chars -= mb_strlen($terminate);
if ( $chars <= 0 )
return $terminate;
$string = mb_substr($string, 0, $chars);
$space = mb_strrpos($string, ' ');
if ($space < mb_strlen($string) / 2)
return $string . $terminate;
else
return mb_substr($string, 0, $space) . $terminate;
}
答案 3 :(得分:0)
tldr;
我认为关于这个问题和当前的答案有一些重要的事情要指出。我将根据戈登的样本数据和一些其他案例演示答案与我的正则表达式答案的比较,以揭示一些不同的结果。
首先,要澄清输入值的质量。戈登说,该功能必须是多字节安全的,并遵守字边界。样本数据在确定截断位置时并未暴露对非空格,非单词字符(例如标点符号)的期望处理,因此我们必须假设以空格字符为目标已经足够了,并且明智的做法是,因为大多数“阅读更多内容”字符串在截断时不必担心遵守标点符号。
第二,在相当普遍的情况下,必须对包含换行符的大量文本应用省略号。
第三,我们随便同意一些基本的数据标准化,例如:
$chars
的值将始终大于mb_strlen()
的{{1}} (Demo)
功能:
$terminator
测试用例:
function truncateGumbo($string, $chars = 50, $terminator = ' …') {
$cutPos = $chars - mb_strlen($terminator);
$boundaryPos = mb_strrpos(mb_substr($string, 0, mb_strpos($string, ' ', $cutPos)), ' ');
return mb_substr($string, 0, $boundaryPos === false ? $cutPos : $boundaryPos) . $terminator;
}
function truncateGordon($string, $chars = 50, $terminator = ' …') {
return mb_strimwidth($string, 0, $chars, $terminator);
}
function truncateSoapBox($string, $chars = 50, $terminate = ' …')
{
$chars -= mb_strlen($terminate);
if ( $chars <= 0 )
return $terminate;
$string = mb_substr($string, 0, $chars);
$space = mb_strrpos($string, ' ');
if ($space < mb_strlen($string) / 2)
return $string . $terminate;
else
return mb_substr($string, 0, $space) . $terminate;
}
function truncateMickmackusa($string, $max = 50, $terminator = ' …') {
$trunc = $max - mb_strlen($terminator, 'UTF-8');
return preg_replace("~(?=.{{$max}})(?:\S{{$trunc}}|.{0,$trunc}(?=\s))\K.+~us", $terminator, $string);
}
执行:
$tests = [
[
'testCase' => "Answer to the Ultimate Question of Life, the Universe, and Everything.",
// 50th char ---------------------------------------------------^
'expected' => "Answer to the Ultimate Question of Life, the …",
],
[
'testCase' => "A single line of text to be followed by another\nline of text",
// 50th char ----------------------------------------------------^
'expected' => "A single line of text to be followed by another …",
],
[
'testCase' => "âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝ",
// 50th char ---------------------------------------------------^
'expected' => "âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ …",
],
[
'testCase' => "123456789 123456789 123456789 123456789 123456789",
// 50th char doesn't exist -------------------------------------^
'expected' => "1234567890123456789012345678901234567890123456789",
],
[
'testCase' => "Hello worldly world",
// 50th char doesn't exist -------------------------------------^
'expected' => "Hello worldly world",
],
[
'testCase' => "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWYXZ1234567890",
// 50th char ---------------------------------------------------^
'expected' => "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV …",
],
];
输出:
foreach ($tests as ['testCase' => $testCase, 'expected' => $expected]) {
echo "\tSample Input:\t\t$testCase\n";
echo "\n\ttruncateGumbo:\t\t" , truncateGumbo($testCase);
echo "\n\ttruncateGordon:\t\t" , truncateGordon($testCase);
echo "\n\ttruncateSoapBox:\t" , truncateSoapBox($testCase);
echo "\n\ttruncateMickmackusa:\t" , truncateMickmackusa($testCase);
echo "\n\tExpected Result:\t{$expected}";
echo "\n-----------------------------------------------------\n";
}
我的模式说明:
尽管看起来确实很难看,但是大多数乱码模式语法都是将数字值插入为动态量词的问题。
我也可以这样写:
Sample Input: Answer to the Ultimate Question of Life, the Universe, and Everything.
truncateGumbo: Answer to the Ultimate Question of Life, the …
truncateGordon: Answer to the Ultimate Question of Life, the Uni …
truncateSoapBox: Answer to the Ultimate Question of Life, the …
truncateMickmackusa: Answer to the Ultimate Question of Life, the …
Expected Result: Answer to the Ultimate Question of Life, the …
-----------------------------------------------------
Sample Input: A single line of text to be followed by another
line of text
truncateGumbo: A single line of text to be followed by …
truncateGordon: A single line of text to be followed by another
…
truncateSoapBox: A single line of text to be followed by …
truncateMickmackusa: A single line of text to be followed by another …
Expected Result: A single line of text to be followed by another …
-----------------------------------------------------
Sample Input: âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝ
truncateGumbo: âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ …
truncateGordon: âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ …
truncateSoapBox: âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ …
truncateMickmackusa: âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ …
Expected Result: âãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđ …
-----------------------------------------------------
Sample Input: 123456789 123456789 123456789 123456789 123456789
truncateGumbo: 123456789 123456789 123456789 123456789 12345678 …
truncateGordon: 123456789 123456789 123456789 123456789 123456789
truncateSoapBox: 123456789 123456789 123456789 123456789 …
truncateMickmackusa: 123456789 123456789 123456789 123456789 123456789
Expected Result: 1234567890123456789012345678901234567890123456789
-----------------------------------------------------
Sample Input: Hello worldly world
truncateGumbo:
Warning: mb_strpos(): Offset not contained in string in /in/ibFH5 on line 4
Hello worldly world …
truncateGordon: Hello worldly world
truncateSoapBox: Hello worldly …
truncateMickmackusa: Hello worldly world
Expected Result: Hello worldly world
-----------------------------------------------------
Sample Input: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWYXZ1234567890
truncateGumbo: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV …
truncateGordon: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV …
truncateSoapBox: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV …
truncateMickmackusa: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV …
Expected Result: abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUV …
-----------------------------------------------------
为简单起见,我将'~(?:\S{' . $trunc . '}|(?=.{' . $max . '}).{0,' . $trunc . '}(?=\s))\K.+~us'
替换为$trunc
,将48
替换为$max
。
50