php preg_replace wrong charset or encoding

时间:2015-10-29 15:54:51

标签: php regex character-encoding

I've this simple code:

function getCleanText($rawText) //removes doublespace and punctuation
{
    return strtolower(preg_replace("/[\s\t]+/u", " ", 
        preg_replace("/[^a-zA-Z1-9àèéìòù]+/u", " ", $rawText)));
}

echo getCleanText("uscì"). " uscì <br>";

the function just removes punctuation and double spaces. Why i've this output?

usc�� uscì 

I mean "uscì" doesn't have any punctuation and the function is supposed to return it as it is without modification. Still i've problem with all accented letters. The web page is encoded in UTF-8. if i try with utf_encode like this

return utf8_encode(strtolower(preg_replace("/[\s\t]+/u", " ", 
        preg_replace("/[^a-zA-Z1-9àèéìòù]+/u", " ", $rawText))));

the output is

usc㬠uscì 

any ideas? Where i can find some documentation to understand my error?

1 个答案:

答案 0 :(得分:1)

使用mb_strtolower,而不只是strtolower解决了我的测试中的问题。我认为这是一个php.ini配置问题,这意味着它适用于某些人而不是其他人。