Question

我正在尝试使用tree* insert_node(node *new_node, tree* mytree)和strip_tags来检测字符串是否包含空html？

trim

string'''（length = 2）

我的调试试图解决这个问题：

$description = '<p>&nbsp;</p>';

$output = trim(strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8')));

var_dump($output);

输出：debug.txt

$description = '<p>&nbsp;</p>';

$test = mb_detect_encoding($description);
$test .= "\n";
$test .= trim(strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8')));
$test .= "\n";
$test .= html_entity_decode($description, ENT_QUOTES, 'UTF-8');

file_put_contents('debug.txt', $test);

Answer 1

如果您使用var_dump(urlencode($output))，您会看到它输出string(6) "%C2%A0"，因此字符代码为0xC2和0xA0。 These two charcodes are unicode for "non-breaking-space"。确保您的文件以UTF-8格式保存，并且您的HTTP标头是UTF-8格式。

那就是说，要修剪这个角色你可以使用正则表达式和unicode修饰符（而不是修剪）：

DEMO：

<?php

$description = '<p>&nbsp;</p>';

$output = trim(strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8')));

var_dump(urlencode($output)); // string(6) "%C2%A0"

// -------

$output = preg_replace('~^\s+|\s+$~', '', strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8')));

var_dump(urlencode($output)); // string(6) "%C2%A0"

// -------

$output = preg_replace('~^\s+|\s+$~u', '', strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8')));
// Unicode! -----------------------^

var_dump(urlencode($output)); // string(0) ""

正则表达式尸检：

~ - 正则表达式修饰符分隔符 - 必须在正则表达式之前，然后在修饰符之前
^\s+ - 字符串的开头后紧跟一个或多个空格（字符串开头的一个或多个空白字符） - （^表示字符串的开头， \s表示空白字符，+表示＆＃34;匹配1到无穷大时间＆＃34;）
| - 或
\s+$ - 一个或多个空白字符后面紧跟字符串的结尾（字符串末尾有一个或多个空白字符）
~ - 结束正则表达式修饰符分隔符
u - 正则表达式修饰符 - 此处使用unicode modifier (PCRE_UTF8)确保我们替换unicode空白字符。

PHP html_entity_decode和修剪混乱

1 个答案: