逐段阅读单词,逐行阅读(C ++)

时间:2019-05-29 14:46:49

标签: c++

我正在寻找一种从文件中读取和定位单词(行号,段落号)的方法。

例如,我要跟踪文件中单词“ you”的编号。每当我在一行上找到这个单词时,我都会将行号和段落号推到两个向量上

ifstream file;
file.open(input.txt)
vector<int> paragraph_number;
vector<int> line_number; 

阅读段落和逐行阅读的最佳方法是什么?谢谢!

3 个答案:

答案 0 :(得分:4)

行号非常简单,因为您可以只使用getline或类似的方法一次读取一行。只需跟踪您从文件中读取一行的次数即可。或者,您可以计算遇到的换行符(\n)的数量。

段落有点棘手,没有标准化的方式查看文件中的段落。您可能需要在段落末尾使用某种字符分隔符。您可以将两个换行符解释为新的段落,但这部分取决于您。

答案 1 :(得分:1)

假设

  • 段落至少由一个空行分隔,因此一行仅包含换行符

  • 即使只有空格的行也不是空行,但这没有真正意义,我可以让您更改;-)

  • 程序会记住单词出现的段落行和列的编号,所有这些数字都以1开头,并且行号是全局的,而不是段落中的行排名

  • 一个单词仅包含字母数字字符,因此所有其他字符都被视为分隔符。即使它们之间没有用空格隔开,也可以在“这不可能”中找到单词“ isn”或“ t”,或者在“ jean-luc”等中找到“ jean”等

  • 程序不检查输入的单词是否有效

提案:

#include <iostream>
#include <fstream>
#include <vector>
#include <string>

int main(int argc, char ** argv)
{
  if (argc != 3)
      std::cerr << "Usage: " << *argv << " <file path> <word>" << std::endl;
  else {
    std::ifstream f(argv[1]);

    if (! f.is_open())
      std::cerr << "Cannot open '" << argv[1] << '\'' << std::endl;
    else {
      std::string word = argv[2];
      std::string line;
      size_t line_num = 0;
      size_t paragraph_num = 0;
      std::vector<size_t> paragraph_number; 
      std::vector<size_t> line_number;
      std::vector<size_t> column_number;
      bool afterEmptyLine = true;

      while (std::getline(f, line)) {
        line_num += 1;
        if (!line.empty()) {
          if (afterEmptyLine) {
            afterEmptyLine = false;
            paragraph_num += 1;
          }

          std::size_t p = 0;

          while ((p = line.find(word, p)) != std::string::npos) {
            // check it is not a subword, suppose a word is only alphanum
            if (((p == 0) || !isalnum(line[p - 1])) &&
                ((line.length() == (p + word.length())) || !isalnum(line[p + word.length()]))) {
              paragraph_number.push_back(paragraph_num);
              line_number.push_back(line_num);
              column_number.push_back(p + 1);
            }

            p += word.length();
          }
        }
        else
          afterEmptyLine = true;
      }

      /* debug */
      std::cout << '\'' << word << "' found " << paragraph_number.size() << " times :" << std::endl;

      for (size_t i = 0; i != paragraph_number.size(); ++i)
        std::cout << "\t paragraph " << paragraph_number[i] 
          << " line " << line_number[i]
            << " column " << column_number[i] << std::endl;
    }
  }

  return 0;
}

编译和执行:

bruno@bruno-XPS-8300:/tmp$ g++ -pedantic -Wextra -Wall c.cc
bruno@bruno-XPS-8300:/tmp$ cat fw
is it you or not you?
this is your decision and you are right

you and me



you
bruno@bruno-XPS-8300:/tmp$ ./a.out
Usage: ./a.out <file path> <word>
bruno@bruno-XPS-8300:/tmp$ ./a.out fw you
'you' found 5 times :
     paragraph 1 line 1 column 7
     paragraph 1 line 1 column 18
     paragraph 1 line 2 column 27
     paragraph 2 line 4 column 1
     paragraph 3 line 8 column 1
bruno@bruno-XPS-8300:/tmp$ 

(文件中的空行实际上是空的)

答案 2 :(得分:0)

尝试这样的事情:

ifstream file("input.txt");

vector<int> paragraph_number;
vector<int> line_number;
string line, word;
int curr_paragraph_num = 0;
int curr_line_num = 0;
bool in_paragraph = false;

while (getline(file, line))
{
    ++curr_line_num;
    if (line.empty())
    {
        in_paragraph = false;
    }
    else
    {
        if (!in_paragraph)
        {
            in_paragraph = true;
            ++curr_paragraph_num;
        }

        istringstream iss(line);
        while (iss >> word)
        {
            if (word == "you")
            {
                paragraph_number.push_back(curr_paragraph_num);
                line_number.push_back(curr_line_num);
            }
        }
    }
}