如何将多字节字符串拆分为Php中的单词? 这是我到目前为止所做的,但我想改进代码......
mb_internal_encoding( 'UTF-8');
mb_regex_encoding( 'UTF-8');
$arr = mb_split( '[\s\[\]().,;:-_]', $str );
有没有办法说字是一系列“alpha”字符(不使用符号a-z,因为我想包含非拉丁字符)
答案 0 :(得分:6)
在这里试试这个宝宝:
preg_match_all('/[\p{L}\p{M}]+/u', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
# Matched text = $result[0][$i];
}
将所有可能的字母与其重音符号匹配为:
"
[\p{L}\p{M}] # Match a single character present in the list below
# A character with the Unicode property “letter” (any kind of letter from any language)
# A character with the Unicode property “mark” (a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.))
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
"
<强> See it. 强>
答案 1 :(得分:0)
许多语言不使用单词(中文)。在这种情况下函数应该返回整个字符串吗?在PHP中,explode()是二进制安全的,因此如果您只需要一个分隔符,那么使用它可能会更快。
答案 2 :(得分:0)
也许您应该使用\w
吗?