Question

String.truncate(usize)的预期方法失败，因为它不考虑Unicode字符（考虑到Rust将字符串视为Unicode，这令人困惑）。

let mut s = "ボルテックス".to_string();
s.truncate(4);

线程''惊慌失措'断言失败：self.is_char_boundary（new_len）'

此外，truncate修改原始字符串，这并不总是需要。

我提出的最好的方法是转换为char并收集到String。

fn truncate(s: String, max_width: usize) -> String {
    s.chars().take(max_width).collect()
}

e.g。

fn main() {
    assert_eq!(truncate("ボルテックス".to_string(), 0), "");
    assert_eq!(truncate("ボルテックス".to_string(), 4), "ボルテッ");
    assert_eq!(truncate("ボルテックス".to_string(), 100), "ボルテックス");
    assert_eq!(truncate("hello".to_string(), 4), "hell");
}

然而，感觉非常沉重。

Answer 1

请务必阅读并理解delnan's point：

Unicode非常复杂。您确定要将char（对应于代码点）作为单位而不是字形集群吗？

本答案的其余部分假设您有充分的理由使用char而不是字形。

考虑到Rust将字符串视为Unicode
，这是令人费解的

这不正确; Rust将字符串视为 UTF-8 。在UTF-8中，每个代码点都映射到可变数量的字节。没有O(1)算法来转换＆＃34; 6个字符＆＃34;到＆＃34; N个字节＆＃34;，所以标准库不会隐藏它。

您可以使用char_indices逐字逐句查找字符串并获取该字符的字节索引：

fn truncate(s: &str, max_chars: usize) -> &str {
    match s.char_indices().nth(max_chars) {
        None => s,
        Some((idx, _)) => &s[..idx],
    }
}

fn main() {
    assert_eq!(truncate("ボルテックス", 0), "");
    assert_eq!(truncate("ボルテックス", 4), "ボルテッ");
    assert_eq!(truncate("ボルテックス", 100), "ボルテックス");
    assert_eq!(truncate("hello", 4), "hell");
}

这也会返回一个切片，如果需要，可以选择移动到新的分配中，或者改变String：

// May not be as efficient as inlining the code...
fn truncate_in_place(s: &mut String, max_chars: usize) {
    let bytes = truncate(&s, max_chars).len();
    s.truncate(bytes);
}

fn main() {
    let mut s = "ボルテックス".to_string();
    truncate_in_place(&mut s, 0);
    assert_eq!(s, "");
}

如何截断字符串最多包含N个字符？

1 个答案: