考虑这个文件:
extern crate csv;
use std::error::Error;
use std::process;
use std::io;
use std::io::Write;
fn main() {
let mut rdr = csv::ReaderBuilder::new()
.has_headers(false)
.delimiter(b'\t')
.from_reader(io::stdin());
let mut stdout_buf = io::BufWriter::new(io::stdout());
for result in rdr.records() {
let record = result.unwrap();
for (i, item) in record.iter().enumerate() {
write!(stdout_buf, "{}", item);
}
}
}
在Cargo.toml
中,唯一的依赖是csv = "1.0.0-beta.3"
。如果我用一个大的制表符分隔文件来提供这个程序,我最初获得100MiB / s的解析速度,但几秒后它降到每秒不到半兆字节。根据{{1}}:
perf
这里可能发生什么?在Samples: 91K of event 'cycles:pp', Event count (approx.): 70889999629
Overhead Command Shared Object Symbol
46.63% csv_mapper libc-2.17.so [.] __memcpy_ssse3_back
34.68% csv_mapper [kernel.kallsyms] [k] clear_page_c_e
5.66% csv_mapper csv_mapper [.] csv_core::reader::Reader::read_record::hc2c3417e5570c6ac
3.38% csv_mapper csv_mapper [.] csv::byte_record::validate::hf69692f1aec60002
0.82% csv_mapper csv_mapper [.] csv_mapper::main::h84d39cb33f062b66
0.79% csv_mapper csv_mapper [.] core::fmt::write::h3f842b6303ea2a70
0.78% csv_mapper csv_mapper [.] _$LT$std..io..Write..write_fmt..Adaptor$LT$$u27$a$C$$u20$T$GT$$u20$as$u20$core..fmt..Write$GT$::write_str::h0002300cbe68df34
0.69% csv_mapper csv_mapper [.] core::str::from_utf8::h1297230116307e46
0.42% csv_mapper csv_mapper [.] mallocx
,x86_64,CentOS 7.3上进行了测试。