我正在尝试验证输入(aA-zZ&αΑ-ωΩ)我到目前为止已经提出了这个因为正则表达式等对XSS&二阶SQL注入。
但是下面它会输出错误,因为它将希腊字符(“α”)视为2字节字符。
<?php
validate_string_chars("aaαα");
function validate_string_chars($string) {
//$valid_chars = array('A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
//$valid_chars = range('a', 'z');
$english_low = range('a', 'z');
$english_cap = range('A', 'Z');
$greek_low = array('α', 'β');
$greek_cap = array('Α', 'Β');
$valid_chars = array_merge($english_low, $english_cap, $greek_low, $greek_cap);
$errors = 0;
for($i = 0; $i < strlen($string); $i++ ) {
$char = substr($string, $i, 1);
if (!in_array($char, $valid_chars)) { $errors++; }
}
echo "\n\r".$errors."\n\r";
}
?>
// Results: 4 (2 errors for each "α")
这是$ char var_dump:
string(1) "a"
string(1) "a"
string(1) "�"
string(1) "�"
string(1) "�"
string(1) "�"
答案 0 :(得分:1)
您需要使用带有UTF8编码的mb_strlen
和mb_substr
来正确计算Unicode字符数:
for($i = 0; $i < mb_strlen($string, 'UTF8'); $i++ ) { // <--- HERE
$char = mb_substr($string, $i, 1, 'UTF8'); // AND HERE --->
if (!in_array($char, $valid_chars)) { $errors++; }
}
请参阅IDEONE demo
实际上,您还可以将Unicode字形与preg_match_all('/\X/u', $str, $matches)
\X
is a shorthand class for a Unicode grapheme匹配。