例如,对于
let n = count_unique_grapheme_clusters(" ");
println!("{}", n);
预期输出为(空格和三个标志:" "
,""
,""
,""
):
4
答案 0 :(得分:5)
我们可以使用unicode-segmentation crate中的graphemes
方法来遍历字素簇并将它们保存在HashSet<&str>
中以过滤出重复项。然后我们得到容器的.len()
。
extern crate unicode_segmentation; // 1.2.1
use std::collections::HashSet;
use unicode_segmentation::UnicodeSegmentation;
fn count_unique_grapheme_clusters(s: &str) -> usize {
let is_extended = true;
s.graphemes(is_extended).collect::<HashSet<_>>().len()
}
fn main() {
assert_eq!(count_unique_grapheme_clusters(""), 0);
assert_eq!(count_unique_grapheme_clusters("a"), 1);
assert_eq!(count_unique_grapheme_clusters(""), 1);
assert_eq!(count_unique_grapheme_clusters("é"), 2);
assert_eq!(count_unique_grapheme_clusters(""), 3);
}