我正在寻找一种从文件中读取和定位单词(行号,段落号)的方法。
例如,我要跟踪文件中单词“ you”的编号。每当我在一行上找到这个单词时,我都会将行号和段落号推到两个向量上
ifstream file;
file.open(input.txt)
vector<int> paragraph_number;
vector<int> line_number;
阅读段落和逐行阅读的最佳方法是什么?谢谢!
答案 0 :(得分:4)
行号非常简单,因为您可以只使用getline
或类似的方法一次读取一行。只需跟踪您从文件中读取一行的次数即可。或者,您可以计算遇到的换行符(\n
)的数量。
段落有点棘手,没有标准化的方式查看文件中的段落。您可能需要在段落末尾使用某种字符分隔符。您可以将两个换行符解释为新的段落,但这部分取决于您。
答案 1 :(得分:1)
假设
段落至少由一个空行分隔,因此一行仅包含换行符
即使只有空格的行也不是空行,但这没有真正意义,我可以让您更改;-)
程序会记住单词出现的段落行和列的编号,所有这些数字都以1开头,并且行号是全局的,而不是段落中的行排名
一个单词仅包含字母数字字符,因此所有其他字符都被视为分隔符。即使它们之间没有用空格隔开,也可以在“这不可能”中找到单词“ isn”或“ t”,或者在“ jean-luc”等中找到“ jean”等
程序不检查输入的单词是否有效
提案:
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
int main(int argc, char ** argv)
{
if (argc != 3)
std::cerr << "Usage: " << *argv << " <file path> <word>" << std::endl;
else {
std::ifstream f(argv[1]);
if (! f.is_open())
std::cerr << "Cannot open '" << argv[1] << '\'' << std::endl;
else {
std::string word = argv[2];
std::string line;
size_t line_num = 0;
size_t paragraph_num = 0;
std::vector<size_t> paragraph_number;
std::vector<size_t> line_number;
std::vector<size_t> column_number;
bool afterEmptyLine = true;
while (std::getline(f, line)) {
line_num += 1;
if (!line.empty()) {
if (afterEmptyLine) {
afterEmptyLine = false;
paragraph_num += 1;
}
std::size_t p = 0;
while ((p = line.find(word, p)) != std::string::npos) {
// check it is not a subword, suppose a word is only alphanum
if (((p == 0) || !isalnum(line[p - 1])) &&
((line.length() == (p + word.length())) || !isalnum(line[p + word.length()]))) {
paragraph_number.push_back(paragraph_num);
line_number.push_back(line_num);
column_number.push_back(p + 1);
}
p += word.length();
}
}
else
afterEmptyLine = true;
}
/* debug */
std::cout << '\'' << word << "' found " << paragraph_number.size() << " times :" << std::endl;
for (size_t i = 0; i != paragraph_number.size(); ++i)
std::cout << "\t paragraph " << paragraph_number[i]
<< " line " << line_number[i]
<< " column " << column_number[i] << std::endl;
}
}
return 0;
}
编译和执行:
bruno@bruno-XPS-8300:/tmp$ g++ -pedantic -Wextra -Wall c.cc
bruno@bruno-XPS-8300:/tmp$ cat fw
is it you or not you?
this is your decision and you are right
you and me
you
bruno@bruno-XPS-8300:/tmp$ ./a.out
Usage: ./a.out <file path> <word>
bruno@bruno-XPS-8300:/tmp$ ./a.out fw you
'you' found 5 times :
paragraph 1 line 1 column 7
paragraph 1 line 1 column 18
paragraph 1 line 2 column 27
paragraph 2 line 4 column 1
paragraph 3 line 8 column 1
bruno@bruno-XPS-8300:/tmp$
(文件中的空行实际上是空的)
答案 2 :(得分:0)
尝试这样的事情:
ifstream file("input.txt");
vector<int> paragraph_number;
vector<int> line_number;
string line, word;
int curr_paragraph_num = 0;
int curr_line_num = 0;
bool in_paragraph = false;
while (getline(file, line))
{
++curr_line_num;
if (line.empty())
{
in_paragraph = false;
}
else
{
if (!in_paragraph)
{
in_paragraph = true;
++curr_paragraph_num;
}
istringstream iss(line);
while (iss >> word)
{
if (word == "you")
{
paragraph_number.push_back(curr_paragraph_num);
line_number.push_back(curr_line_num);
}
}
}
}