Question

Rust为字符串提供trim方法：str.trim()删除前导和尾随空格。我想有一个方法对bytestrings做同样的事情。它应该采用Vec<u8>并删除前导和尾随空格（空格，0x20和htab，0x09）。

编写trim_left()很简单，只需使用skip_while()的迭代器：Rust Playground

fn main() {
    let a: &[u8] = b"     fo o ";
    let b: Vec<u8> = a.iter().map(|x| x.clone()).skip_while(|x| x == &0x20 || x == &0x09).collect();
    println!("{:?}", b);
}

但是为了修剪正确的字符，如果找到空格后，如果列表中没有其他字母，我需要向前看。

Answer 1

这是一个返回切片的实现，而不是像Vec<u8>那样返回新的str::trim()。它也在[u8]上实现，因为它比Vec<u8>更通用（你可以便宜地从向量中获取切片，但是从切片创建向量的成本更高，因为它涉及堆分配和副本）。

trait SliceExt {
    fn trim(&self) -> &Self;
}

impl SliceExt for [u8] {
    fn trim(&self) -> &[u8] {
        fn is_whitespace(c: &u8) -> bool {
            *c == b'\t' || *c == b' '
        }

        fn is_not_whitespace(c: &u8) -> bool {
            !is_whitespace(c)
        }

        if let Some(first) = self.iter().position(is_not_whitespace) {
            if let Some(last) = self.iter().rposition(is_not_whitespace) {
                &self[first..last + 1]
            } else {
                unreachable!();
            }
        } else {
            &[]
        }
    }
}

fn main() {
    let a = b"     fo o ";
    let b = a.trim();
    println!("{:?}", b);
}

如果您在Vec<u8>之后确实需要trim()，则可以在切片上调用into()将其转换为Vec<u8>。

fn main() {
    let a = b"     fo o ";
    let b: Vec<u8> = a.trim().into();
    println!("{:?}", b);
}

Answer 2

这是一个比其他答案简单得多的版本。

pub fn trim_ascii_whitespace(x: &[u8]) -> &[u8] {
    let from = match x.iter().position(|x| !x.is_ascii_whitespace()) {
        Some(i) => i,
        None => return &x[0..0],
    };
    let to = x.iter().rposition(|x| !x.is_ascii_whitespace()).unwrap();
    &x[from..=to]
}

奇怪的是，这不在标准库中。我会认为这是一项常见的任务。

无论如何，它是一个完整的文件/特征（带有测试！），您可以复制/粘贴。

use std::ops::Deref;

/// Trait to allow trimming ascii whitespace from a &[u8].
pub trait TrimAsciiWhitespace {
    /// Trim ascii whitespace (based on `is_ascii_whitespace()`) from the
    /// start and end of a slice.
    fn trim_ascii_whitespace(&self) -> &[u8];
}

impl<T: Deref<Target=[u8]>> TrimAsciiWhitespace for T {
    fn trim_ascii_whitespace(&self) -> &[u8] {
        let from = match self.iter().position(|x| !x.is_ascii_whitespace()) {
            Some(i) => i,
            None => return &self[0..0],
        };
        let to = self.iter().rposition(|x| !x.is_ascii_whitespace()).unwrap();
        &self[from..=to]
    }
}

#[cfg(test)]
mod test {
    use super::TrimAsciiWhitespace;

    #[test]
    fn basic_trimming() {
        assert_eq!(" A ".as_bytes().trim_ascii_whitespace(), "A".as_bytes());
        assert_eq!(" AB ".as_bytes().trim_ascii_whitespace(), "AB".as_bytes());
        assert_eq!("A ".as_bytes().trim_ascii_whitespace(), "A".as_bytes());
        assert_eq!("AB ".as_bytes().trim_ascii_whitespace(), "AB".as_bytes());
        assert_eq!(" A".as_bytes().trim_ascii_whitespace(), "A".as_bytes());
        assert_eq!(" AB".as_bytes().trim_ascii_whitespace(), "AB".as_bytes());
        assert_eq!(" A B ".as_bytes().trim_ascii_whitespace(), "A B".as_bytes());
        assert_eq!("A B ".as_bytes().trim_ascii_whitespace(), "A B".as_bytes());
        assert_eq!(" A B".as_bytes().trim_ascii_whitespace(), "A B".as_bytes());
        assert_eq!(" ".as_bytes().trim_ascii_whitespace(), "".as_bytes());
        assert_eq!("  ".as_bytes().trim_ascii_whitespace(), "".as_bytes());
    }
}

Answer 3

我们所要做的就是找到第一个非空白字符的索引，一次从开始向前计数，另一次从结束向后计数。

fn is_not_whitespace(e: &u8) -> bool {
    *e != 0x20 && *e != 0x09
}

fn main() {
    let a: &[u8] = b"     fo o ";

    // find the index of first non-whitespace char
    let begin = a.iter()
        .position(is_not_whitespace);

    // find the index of the last non-whitespace char
    let end = a.iter()
        .rev()
        .position(is_not_whitespace)
        .map(|j| a.len() - j);

    // build it
    let vec = begin.and_then(|i| end.map(|j| a[i..j].iter().collect()))
        .unwrap_or(Vec::new());

    println!("{:?}", vec);
}

如何实现Vec <u8>的修剪？

3 个答案: