Question

对不起标题，我真的不知道该怎么说......

我经常有一个字符串需要在X字符后剪切，我的问题是这个字符串经常包含特殊字符，如：＆amp; egrave;

所以，我想知道，他们是一种在PHP中知道的方式，而不会改变我的字符串，如果我在剪切字符串时，我正处于特殊字符的中间。

示例

This is my string with a special char : &egrave; - and I want it to cut in the middle of the "&egrave;" but still keeping the string intact

所以现在我的子字符串结果是：

This is my string with a special char : &egra

但我希望有这样的东西：

This is my string with a special char : &egrave;

Answer 1

这里最好的做法是将您的字符串存储为UTF-8而不包含任何html实体，并使用mb_*函数系列作为编码。

但是，如果你的字符串是ASCII或iso-8859-1 / win1252，你可以使用mb_string库的特殊utf8编码：

HTML-ENTITIES

但是，如果您的基础字符串是UTF-8或其他一些多字节编码，则使用$s = 'This is my string with a special char : è - and I want it to cut in the middle of the "è" but still keeping the string intact'; echo mb_substr($s, 0, 40, 'HTML-ENTITIES'); echo mb_substr($s, 0, 41, 'HTML-ENTITIES'); 不安全！这是因为HTML-ENTITIES实际上意味着“win1252具有高位字符作为html实体”。这是一个可能出错的例子：

HTML-ENTITIES

当您的字符串采用多字节编码时，您必须在拆分之前将所有html实体转换为通用编码。 E.g：

// Assuming that é is in utf8:
mb_substr('é ', 0, 2, 'HTML-ENTITIES') === '&Atilde;&copy;'
// should be '&eacute; '

Answer 2

您可以先使用html_entity_decode（）来解码所有HTML实体。然后拆分你的字符串。然后htmlentities（）重新编码实体。

$decoded_string = html_entity_decode($original_string);
// implement logic to split string here

// then for each string part do the following:
$encoded_string_part = htmlentities($split_string_part);

Answer 3

最长的HTML实体长度为10个字符，包括＆符号和分号。如果您打算以X字节剪切字符串，请检查字节X-9到X-1是否为＆符号。如果相应的分号出现在字节X或更高版本，请在分号后面而不是在字节X之后剪切字符串。

但是，如果您愿意预处理该字符串，Mike的解决方案会更准确，因为他会将字符串剪切为X 个字符，而不是字节。

Answer 4

最佳解决方案是将文本存储为UTF-8，而不是将它们存储为HTML实体。除此之外，如果你不介意关闭计数（&grave;等于一个字符，而不是7），那么下面的代码片段应该有效：

<?php
$string = 'This is my string with a special char : &egrave; - and I want it to cut in the middle of the "&egrave;" but still keeping the string intact';
$cut_string = htmlentities(mb_substr(html_entity_decode($string, NULL, 'UTF-8'), 0, 45), NULL, 'UTF-8')."<br><br>";

注意：如果您使用其他功能对文字进行编码（例如htmlspecialchars()），则使用该功能代替htmlentities()。如果您使用自定义函数，则使用与新自定义函数相反的另一个自定义函数，而不是html_entity_decode()（以及自定义函数而不是htmlentities()）。

Answer 5

一个小的强力解决方案，我对PCRE表达式并不满意，假设你要传递80个字符，最长的HTML表达式是7个字符长：

$regex = '~^(.{73}([^&]{7}|.{0,7}$|[^&]{0,6}&[^;]+;))(.*)~mx'
// Note, this could return a bit of shorter text
return preg_replace( $regexp, '$1', $text);

你知道吗：

.{73} - 73个字符
[^&]{7} - 好吧，我们可以填写任何不包含＆amp;
.{0,7}$ - 请记住可能的结束（这不应该是必要的，因为较短的文字根本不匹配）
[^&]{0,6}&[^;]+; - 最多6个字符（您将在第79位），然后&并完成

似乎更好但需要玩数字的东西是：

// check whether $text is at least $N chars long :)
if( strlen( $text) < $N){
    return;
}

// Get last &
$pos = strrpos( $text, '&', $N);

// We're not young anymore, we have to check this too (not entries at all) :)
if( $pos === false){
    return substr( $text, 0, $N);
}

// Get Last
$end = strpos( $text, ';', $N);

// false wouldn't be smaller then 0 (entry open at the beginning
if( $end === false){
    $end = -1;
}

// Okay, entry closed (; is after &)(
if( $end > $pos){
   return substr($text, 0, $N);
}

// Now we need to find first ;
$end = strpos( $text, ';', $N)
if( $end === false){
    // Not valid HTML, not closed entry, do whatever you want
}

return substr($text, 0, $end);

_{检查数字，索引中某处可能有+/- 1 ......}

Answer 6

我认为你必须使用strpos和strrpos的组合来查找下一个和前一个空格，在空格之间解析文本，根据已知的特殊字符列表进行检查，如果匹配，则扩展你的“剪切” “到下一个空间的位置。如果你有一个现有代码示例，我们可以给你一个更好的答案。

PHP - 带有特殊字符的X字符后的子串

6 个答案: