如何使用ifstream(C ++)只读取一些以前知道的行

时间:2016-09-04 13:51:32

标签: c++ file input io

通过对文件进行预处理,我找到了一些行进行进一步处理,知道我想读取那些行。有没有比使用ifstream::getline(...)

逐行读取行更快的解决方案

例如我知道我只想要产品4(0-4-8-12-16 -...)的行或存储在矢量中的特殊行号......

现在我正在这样做:

string line;
int counter = 0;
while( getline(ifstr,line) ){
   if(counter%4 =0){
      // some code working with line
   }
}

但我想要这样的东西(如果更快)

while(getline(ifstr,line)){ 
  // some code working with line
  while(++counter%4 !=0){ // or checking on index vector
     skipline(ifstr)         
  }
}

让我再次提到我有一些行索引(已排序但不是常规)但我为了简单起见使用了product4的这个例子。

编辑:我想在开始时跳到行,例如我知道我需要从行号2000读取,如何快速跳过1999行? 谢谢大家

3 个答案:

答案 0 :(得分:2)

将对应于文件行开头的std::fstream::streampos个实例存储到std::vector中,然后您可以使用此向量的索引访问特定行。接下来是一个可能的实现,

class file_reader {
public:
    // load file streampos offsets during construction
    explicit file_reader(const std::string& filename) 
        : fs(filename) { cache_file_streampos(); }
    std::size_t lines() const noexcept { return line_streampos_vec.size(); }
    // get a std::string representation of specific line in file
    std::string read_line(std::size_t n) {
        if (n >= line_streampos_vec.size() - 1)
            throw std::out_of_range("out of bounds");
        navigate_to_line(n);
        std::string rtn_str;
        std::getline(fs, rtn_str);
        return rtn_str;
    }
private:
    std::fstream fs;
    std::vector<std::fstream::streampos> line_streampos_vec;
    const std::size_t max_line_length = // some sensible value
    // store file streampos instances in vector
    void cache_file_streampos() {
        std::string s;
        s.reserve(max_line_length);
        while (std::getline(fs, s)) 
            line_streampos_vec.push_back(fs.tellg());
    }
    // go to specific line in file stream
    void navigate_to_line(std::size_t n) {
        fs.clear();
        fs.seekg(line_streampos_vec[n]);
    }
};

然后,您可以通过

读取文件的特定行
file_reader fr("filename.ext");
for (int i = 0; i < fr.lines(); ++i) {
    if (!(i % 4))
        std::string line_contents = fr.read_line(i); // then do something with the string 
}

答案 1 :(得分:2)

因为@caps说这使他觉得标准库中没有任何东西可以帮助完成这类任务,我觉得不得不另外证明:)

<强> Live On Coliru

var dataToStore = JSON.stringify(data);
localStorage.setItem('someData', dataToStore);

这显然更通用。权衡(目前)是令牌化迭代器需要随机访问迭代器。我发现这是一个很好的权衡,因为文件上的“随机访问”确实要求内存映射文件

现场演示1:从字符串到nnoremap <silent> <F8> :!clear;gcc % -o %:r && ./%:r<CR>

<强> Live On Coliru

template <typename It, typename Out, typename Filter = std::vector<int> >
Out retrieve_lines(It begin, It const end, Filter lines, Out out, char const* delim = "\\n") {
    if (lines.empty())
        return out;

    // make sure input is orderly
    assert(std::is_sorted(lines.begin(), lines.end()));
    assert(lines.front() >= 0);

    std::regex re(delim);
    std::regex_token_iterator<It> line(begin, end, re, -1), eof;

    // make lines into incremental offsets
    std::adjacent_difference(lines.begin(), lines.end(), lines.begin());

    // iterate advancing by each offset requested
    auto advanced = [&line, eof](size_t n) { while (line!=eof && n--) ++line; return line; };

    for (auto offset = lines.begin(); offset != lines.end() && advanced(*offset) != eof; ++offset) {
        *out++ = *line;
    }

    return out;
}

打印

vector<string>

现场演示2:从文件到int main() { std::vector<std::string> output_lines; std::string is(" a b c d e\nf g hijklmnop\nqrstuvw\nxyz"); retrieve_lines(is.begin(), is.end(), {0,3,999}, back_inserter(output_lines)); // for debug purposes for (auto& line : output_lines) std::cout << line << "\n"; }

<强> Live On Coliru

 a b c d e
xyz

打印例如。

cout
  

#include <boost/iostreams/device/mapped_file.hpp> int main() { boost::iostreams::mapped_file_source is("/etc/dictionaries-common/words"); retrieve_lines(is.begin(), is.end(), {13,784, 9996}, std::ostream_iterator<std::string>(std::cout, "\n")); } 的使用可以很容易地直接替换为::mmap,但我发现它在演示样本中更加丑陋。

答案 2 :(得分:1)

ArchbishopOfBanterbury的回答很好,我同意他的看法,当你进行预处理时,只需存储每行开头的字符位置,就可以获得更清晰的代码和更高的效率。

但是,假设这是不可能的(也许预处理由其他API处理,或者来自用户输入),有一个解决方案应该只需要读取指定行所需的最少量的工作。 / p>

根本问题在于,给定具有可变行长度的文件,您无法知道每行的开始和结束位置,因为行被定义为以'\n'结尾的字符序列。因此,您必须解析每个字符以检查它是否为'\n',如果是,则推进您的行计数器并在行计数器与您所需的输入之一匹配时读取行。

auto retrieve_lines(std::ifstream& file_to_read, std::vector<int> line_numbers_to_read) -> std::vector<std::string>
{
    auto begin = std::istreambuf_iterator<char>(file_to_read);
    auto end = std::istreambuf_iterator<char>();

    auto current_line = 0;
    auto next_line_num = std::begin(line_numbers_to_read);

    auto output_lines = std::vector<std::string>();
    output_lines.reserve(line_numbers_to_read.size());  //this may be a silly "optimization," since all the strings are still separate unreserved buffers

    //we can bail if we've reached the end of the lines we want to read, even if there are lines remaining in the stream
    //we *must* bail if we've reached the end of the stream, even if there are supposedly lines left to read; that input must have been incorrect
    while(begin != end && next_line_num != std::end(line_numbers_to_read))
    {
        if(current_line == *next_line_num)
        {
            auto matching_line = std::string();
            if(*begin != '\n')
            {
                //potential optimization: reserve matching_line to something that you expect will fit most/all of your input lines
                while(begin != end && *begin != '\n')
                {
                    matching_line.push_back(*begin++);
                }
            }
            output_lines.emplace_back(matching_line);
            ++next_line_num;
        }
        else 
        {
            //skip this "line" by finding the next '\n'
            while(begin != end && *begin != '\n')
            {
                ++begin;
            }
        }

        //either code path in the previous if/else leaves us staring at the '\n' at the end of a line,
        //which is not the right state for the next iteration of the loop.
        //So skip this '\n' to get to the beginning of the next line
        if (begin != end && *begin == '\n')
        {
            ++begin;
        }

        ++current_line;
    }

    return output_lines;
}

此处显示Coliru以及input I tested it with。正如您所看到的,它正确处理空行以及正确处理被告知要占用的行多于文件中的行。