htmlentities和html_entity_decode的行为不同

时间:2015-03-16 15:05:37

标签: php decode html-entities truncate htmlspecialchars

我想将字符串截断为一定数量的字符。该字符串包含html字符。请注意,我从字符串中删除了所有html标记。现在,如果断点处有一个特殊字符,它不应该在html字符的中间,而是在之前或之后。这些示例不起作用:

//example 1
$str = "French for French is français";
$str = substr($str, 0, 27);
//$str contains "French for French is fran&c";

//example 2
$str = "the en dash looks like –";
$str = substr($str, 0, 25);
//$str contains "the en dash looks like &#";

所以我想我应该首先将特殊字符转换为单个字符,进行截断然后将单个字符还原为特殊字符。它似乎适用于第一个例子,但不是第二个例子。

//example 1
$str = "French for French is français";
$str = html_entity_decode($str);
$str = substr($str, 0, 27);
$str = htmlentities($str);
//$str contains "French for French is frança";

//example 2
$str = "the en dash looks like –";
$str = html_entity_decode($str);
$str = substr($str, 0, 25);
$str = htmlentities($str);
//$str contains "the en dash looks like &#";

如果两个示例都按照我期望的方式运行,我应该更改什么?

1 个答案:

答案 0 :(得分:2)

htmlentities默认使用您的default_charset php.ini值进行编码。如果您没有使用支持您要转换的实体的字符集,则它可能不会按预期运行。试试这个,看看你是否得到不同的结果。

htmlentities($str, null, 'utf-8');

html_entity_decode($str, null, 'utf-8');

mb_substr($str, 0, 25, 'utf-8');

http://php.net/htmlentities

http://php.net/html_entity_decode

http://php.net/manual/en/function.mb-substr.php