希腊字母验证php字符串

时间:2015-09-18 07:48:00

标签: php regex

我正在尝试验证输入(aA-zZ&αΑ-ωΩ)我到目前为止已经提出了这个因为正则表达式等对XSS&二阶SQL注入。

但是下面它会输出错误,因为它将希腊字符(“α”)视为2字节字符。

<?php

validate_string_chars("aaαα");

function validate_string_chars($string) {

    //$valid_chars = array('A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z');
    //$valid_chars = range('a', 'z');
    $english_low    = range('a', 'z');
    $english_cap    = range('A', 'Z');
    $greek_low      = array('α', 'β');
    $greek_cap      = array('Α', 'Β');
    $valid_chars    = array_merge($english_low, $english_cap, $greek_low, $greek_cap);

    $errors = 0;

    for($i = 0; $i < strlen($string); $i++ ) {
        $char = substr($string, $i, 1);
        if (!in_array($char, $valid_chars)) { $errors++; }
    }

    echo "\n\r".$errors."\n\r";
}
?>

// Results: 4 (2 errors for each "α")

这是$ char var_dump:

string(1) "a"
string(1) "a"
string(1) "�"
string(1) "�"
string(1) "�"
string(1) "�"

1 个答案:

答案 0 :(得分:1)

您需要使用带有UTF8编码的mb_strlenmb_substr来正确计算Unicode字符数:

for($i = 0; $i < mb_strlen($string, 'UTF8'); $i++ ) {  // <--- HERE
    $char = mb_substr($string, $i, 1, 'UTF8');         //    AND HERE --->
    if (!in_array($char, $valid_chars)) { $errors++; }
}

请参阅IDEONE demo

实际上,您还可以将Unicode字形与preg_match_all('/\X/u', $str, $matches) \X is a shorthand class for a Unicode grapheme匹配。