多字节安全计数字符串中的不同字符

时间:2011-10-11 17:32:11

标签: php string utf-8

我不想找到一种智能有效的方法来计算一个字符串中有多少个不同的字母字符。例如:

$str = "APPLE";
echo char_count($str) // should return 4, because APPLE has 4 different chars 'A', 'P', 'L' and 'E'

$str = "BOB AND BOB"; // should return 5 ('B', 'O', 'A', 'N', 'D'). 

$str = 'PLÁTANO'; // should return 7 ('P', 'L', 'Á', 'T', 'A', 'N', 'O')

它应该支持UTF-8字符串!

5 个答案:

答案 0 :(得分:11)

如果您正在处理UTF-8(您真的应该考虑,imho),所有发布的解决方案(使用strlen,str_split或count_chars)都不起作用,因为它们都将一个字节视为一个字符(这是显然不适用于UTF-8。

<?php

$treat_spaces_as_chars = true;
// contains hälöwrd and a space, being 8 distinct characters (7 without the space)
$string = "hällö wörld"; 
// remove spaces if we don't want to count them
if (!$treat_spaces_as_chars) {
  $string = preg_replace('/\s+/u', '', $string);
}
// split into characters (not bytes, like explode() or str_split() would)
$characters = preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
// throw out the duplicates
$unique_characters = array_unique($characters);
// count what's left
$numer_of_characters = count($unique_characters);

如果你想丢弃所有非单词字符:

<?php

$ignore_non_word_characters = true;
// contains hälöwrd and PIE, as this is treated as a word character (Greek)
$string = "h,ä*+l•π‘°’lö wörld"; 
// remove spaces if we don't want to count them
if ($ignore_non_word_characters) {
  $string = preg_replace('/\W+/u', '', $string);
}
// split into characters (not bytes, like explode() or str_split() would)
$characters = preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
// throw out the duplicates
$unique_characters = array_unique($characters);
// count what's left
$numer_of_characters = count($unique_characters);

var_dump($characters, $unique_characters, $numer_of_characters);

答案 1 :(得分:5)

只需使用count_chars

echo count(array_filter(count_chars($str)));

count_chars()返回的数组也会告诉你字符串中每个字符的数量。

答案 2 :(得分:1)

count_chars返回所有ascii字符的映射,告诉您字符串中每个字符的数量。这是您自己实施的起点。

function countchars($str, $ignoreSpaces) {
  $map = array();
  $len = strlen($str);
  for ($i=0; $i < $len; $i++) {
    if (!isset($map[$str{$i}])) {
      $map[$str{$i}] = 1;
    } else {
      $map[$str{$i}]++;
    }    
  }

  if ($ignoreSpaces) {
    unset($map[' ']);
  }

  return $map;
}

print_r(countchars('Hello World'));

答案 3 :(得分:0)

这是一个使用关联数组神奇的功能。在线性时间工作。 (大O = log(n)

function uniques($string){
   $arr = array();
   $parts = str_split($string);
   foreach($parts as $part)
      $arr["$part"] = "yup";
   return count($arr);
}

$str = "APPLE";
echo uniques($str);  // outputs 4

答案 4 :(得分:0)

我接受它,

$chars = array_count_values(str_split($input));

这将为您提供一个由关键字组成的唯一字母的关联数组,以及作为值的出现次数。

如果您对发生的次数不感兴趣,

$chars = array_unique(str_split($input));
$numChars = count($chars);