如何在Rust中计算字符串中的唯一字素簇?

时间:2018-08-13 08:50:57

标签: unicode rust

例如,对于

let n = count_unique_grapheme_clusters("    ");
println!("{}", n);

预期输出为(空格和三个标志:" """""""):

4

1 个答案:

答案 0 :(得分:5)

我们可以使用unicode-segmentation crate中的graphemes方法来遍历字素簇并将它们保存在HashSet<&str>中以过滤出重复项。然后我们得到容器的.len()

extern crate unicode_segmentation; // 1.2.1

use std::collections::HashSet;

use unicode_segmentation::UnicodeSegmentation;

fn count_unique_grapheme_clusters(s: &str) -> usize {
    let is_extended = true;
    s.graphemes(is_extended).collect::<HashSet<_>>().len()
}

fn main() {
    assert_eq!(count_unique_grapheme_clusters(""), 0);
    assert_eq!(count_unique_grapheme_clusters("a"), 1);
    assert_eq!(count_unique_grapheme_clusters(""), 1);
    assert_eq!(count_unique_grapheme_clusters("é"), 2);
    assert_eq!(count_unique_grapheme_clusters(""), 3);
}

Playground