Question

在Rust中，是否有一种惯用的方法可以一次处理一个文件？

这似乎与我追求的大致相同：

let mut f = io::BufReader::new(try!(fs::File::open("input.txt")));

for c in f.chars() {
    println!("Character: {}", c.unwrap());
}

但是，从Rust v1.6.0起，Read::chars仍然不稳定。

我考虑使用Read::read_to_string，但文件可能很大，我不想将其全部读入内存。

Answer 1

让我们比较4种方法。

<强> 1。 Read::chars

您可以复制Read::chars实施，但

标记为不稳定

部分读/写错误发生的语义目前尚不清楚，可能会发生变化

所以必须要小心。无论如何，这似乎是最好的方法。

<强> 2。 flat_map

flat_map替代方案无法编译：

use std::io::{BufRead, BufReader};
use std::fs::File;

pub fn main() {
    let mut f = BufReader::new(File::open("input.txt").expect("open failed"));

    for c in f.lines().flat_map(|l| l.expect("lines failed").chars()) {
        println!("Character: {}", c);
    }
}

问题是chars借用了字符串，但l.expect("lines failed")仅存在于闭包内，因此编译器会给出错误borrowed value does not live long enough。

第3。嵌套

此代码

use std::io::{BufRead, BufReader};
use std::fs::File;

pub fn main() {
    let mut f = BufReader::new(File::open("input.txt").expect("open failed"));

    for line in f.lines() {
        for c in line.expect("lines failed").chars() {
            println!("Character: {}", c);
        }
    }
}

有效，但它会为每一行分配一个字符串。此外，如果输入文件没有换行符，整个文件将加载到内存中。

<强> 4。 BufRead::read_until

方法3的内存有效替代方法是使用Read::read_until，并使用单个字符串来读取每一行：

use std::io::{BufRead, BufReader};
use std::fs::File;

pub fn main() {
    let mut f = BufReader::new(File::open("input.txt").expect("open failed"));

    let mut buf = Vec::<u8>::new();
    while f.read_until(b'\n', &mut buf).expect("read_until failed") != 0 {
        // this moves the ownership of the read data to s
        // there is no allocation
        let s = String::from_utf8(buf).expect("from_utf8 failed");
        for c in s.chars() {
            println!("Character: {}", c);
        }
        // this returns the ownership of the read data to buf
        // there is no allocation
        buf = s.into_bytes();
        buf.clear();
    }
}

Answer 2

我无法使用 lines()，因为我的文件可能是单行，大小为 GB。这是对从旧版 Rust 复制 Read::chars 的 @malbarbo's recommendation 的改进。 utf8-chars crate 已经为您添加了 .chars() 到 BufRead。

检查它们的 repository，看起来它们一次加载的字节数不超过 4 个。

您的代码看起来与 Rust 删除 Read::chars 之前一样：

use std::io::stdin;
use utf8_chars::BufReadCharsExt;

fn main() {
    for c in stdin().lock().chars().map(|x| x.unwrap()) {
        println!("{}", c);
    }
}

将以下内容添加到您的 Cargo.toml：

[dependencies]
utf8-chars = "1.0.0"

Answer 3

这里有两个有意义的解决方案。

首先，您可以复制Read::chars()的实现并使用它;如果/当它稳定时，这将使代码移动到标准库实现完全无关紧要。

另一方面，您可以简单地逐行迭代（使用f.lines()），然后在每一行上使用line.chars()来获取字符。这有点像哈基，但肯定会有用。

如果您只想要一个循环，则可以将flat_map()与|line| line.chars()这样的lambda一起使用。

在Rust

3 个答案: