捕获文本文件行中的错误,跳过和报告

时间:2013-04-13 14:43:05

标签: c++ string whitespace split

我有一个我需要阅读的文本文件,每一行都是IP,一个URL,最后是一个日期,所有这些都用空格分隔。我希望将每条信息分配给我的类对象“访问者”的相应变量。我有一个数组,以便存储它们。我的问题是当我尝试浏览文本文件和线条时......我不断在所有文本之间获取空白区域。

class Visitors{

public:
string IP;
string URL;
string dateAccessed;

};

int main(){

Visitors hits[N];

string filename, theLine;
ifstream infile;

cout << "Enter file name (with extension):" << flush;

while(true){

    string infilename;
    getline(cin, infilename);
    infile.open(infilename.c_str());
    if(infile) break;

    cout << "Invalid file. Please enter valid file name: " << flush;

}

cout << "\n";

while(!infile.eof()){

    getline(infile, theLine);

    istringstream iss(theLine);

    do{

        string ip;
        string url;
        string date;

        iss >> ip;
        iss >> url;
        iss >> date;

        if(ip != "\n"){

             cout << "The IP: " << ip << endl;

        }

        if(url != "\n"){

             cout << "The URL: " << url << endl;

        }

        if(date != "\n"){

            cout << "The DA: " << date << endl;

        }


    }while(iss);

}

return 0;

}

我尝试使用if语句来获取所有只是“new-lines”的字符串并忽略它们但是没有用,所以我不完全确定如何忽略它们。我还想添加检查以查看是否有任何信息错误(日期长度不是2/2/4个字符,URL中缺少www。等)

以下是一些示例输出,以更好地展示我的问题......

Enter file name (with extension):hits.txt

The IP: 192.168.1.101
The URL: www.cs.stonybrook.edu
The DA: 01/01/2013
The IP:
The URL:
The DA:
The IP: 192.168.1.101
The URL: www.cs.stonybrook.edu
The DA: 01/01/2013
The IP:
The URL:
The DA:
The IP: 123.112.15.151
The URL: www.cs.stonybrook.edu
The DA: 01/01/2013
编辑:好的,所以我已经弄清楚如何遍历每一行,将其分解并相应地将字符串添加到对象数组中的类变量中它所属的位置。现在的问题是我想检查每一行的每个字符串是否有错误(例如,日期是不可能的,或者IP中的一个数字是256等)。在发现该错误后,我想跳到下一行,进行相同的检查,如果一切正常,它会将类变量初始化为数组中的正确位置。这是我的代码,以了解我正在尝试做什么......

#include <iostream>
#include <fstream>
#include <string>
#include <sstream>

#define N 50

using namespace std;

class Visitors{

public:
string IP;
string URL;
string dateAccessed;

};

int main(){

Visitors hits[N];

string infilename, filename, ip, url, date;
ifstream infile;

int i = 0;

cout << "Enter file name (with extension):" << flush;

while(true){

    infilename = "";
    getline(cin, infilename);
    infile.open(infilename.c_str());
    if(infile) break;

    cout << "Invalid file. Please enter valid file name: " << flush;

}

cout << "Loading " << infilename << "..." << endl;;
cout << "\n";

while(infile.good()){

    string line;
    getline(infile, line);
    stringstream ss(line);

    if(ss >> ip >> url >> date){

        cout << "The IP: " << ip << endl;
        hits[i].IP = ip;

        cout << "The URL: " << url << endl;
        hits[i].URL = url;

        cout << "The DA: " << date << endl;
        hits[i].dateAccessed = date;

        i++;

    }

    else{

        cerr << "error" << std::endl;

    }

    /*

    if(ip.length() > 15 || ip.length() < 7){

        cout << "Found a record with an invalid IP format (not XXX.XXX.XXX.XXX)...ignoring entry";

    }

    //if(any of the numbers in the IP address are great then 255)
        //INVALID IP...IGNORE ENTRY

    else{

        cout << "The IP: " << ip << endl;
        hits[i].IP = ip;

    }

    //if(url doesnt start with www. or doesnt end with .xxx)
        //INVALID URL...IGNORE ENTRY

    else{

        cout << "The URL: " << url << endl;
        hits[i].URL = url;

    }

    //if(date.length != 10)
        //INVALID DATE FORMAT...IGNORE ENTRY

    //if(first 2 numbers in date arent between 01 and 12
         //OR if second 2 numbers arent between 01 and 31 depending on month OR etc.)
         //INVALID....IGNORE ENTRY

    else{

        cout << "The DA: " << date << endl;
        hits[i].dateAccessed = date;

    }

    i++;*/

}

return 0;

}

显然,它没有被组织或组合在一起如何实际存在于程序中,但它是我想要实现的目标的总体思路。我最大的问题是如何跳过文件中的一行而不会打扰我在我的数组中的位置,或者如果所有行都有错误,它会捕获每一行。

3 个答案:

答案 0 :(得分:3)

您不需要stringstream和内部循环:

#include <string>
#include <iostream>
#include <fstream>
#include <sstream>
int main(){
    std::string filename, theLine;
    std::ifstream infile;
    std::cout << "Enter file name (with extension):" << std::flush;
    while(true){
        std::string infilename;
        getline(std::cin, infilename);
        infile.open(infilename.c_str());
        if(infile) break;
        std::cout << "Invalid file. Please enter valid file name: " 
            << std::flush;
    }

    std::cout << "\n";
    std::string ip, url, date;
    while(infile.good()) {
        std::string line;
        getline(infile, line);
        std::stringstream ss(line);
        if (ss >> ip >> url >> date) {
            std::cout << "The IP: " << ip << std::endl;
            std::cout << "The URL: " << url << std::endl;
            std::cout << "The DA: " << date << std::endl;
        } else {
            std::cerr << "error" << std::endl;
        }
    }

    return 0;
}

您尝试重复解析同一行而不从文件中读取新行的内部循环。

答案 1 :(得分:1)

cout << "Enter file name (with extension):" << flush;

string infilename;
getline(cin, infilename);
infile.open(infilename.c_str());

// You don't need to loop here. You can just exit the program. 
// But this is optional.
if(!infile) {
   cout << "Invalid file. Please enter valid file name: " << endl;
   exit(1);
}

cout << endl;

int line_nr = 1;
while(getline(infile, theLine)){
    istringstream iss(theLine);
    string ip;
    string url;
    string date;

    // A line is expected to have ip url date format. Otherwise it is error.
    if(iss >> ip >> url >> date)
        cout << "The IP: " << ip << endl;
        cout << "The URL: " << url << endl;
        cout << "The DA: " << date << endl;

     }

     else {
        cout << "Error reading data on line: " << line_nr << flush;
        break;
     }

     ++line_nr;

}

答案 2 :(得分:1)

最简单的解决方案(根据您需要更改的代码数量)仅与""进行比较,而不是与"\n"进行比较:

if (ip   != "") { cout << "The IP: "  << ip   << endl; }
if (url  != "") { cout << "The URL: " << url  << endl; }
if (date != "") { cout << "The DA: "  << date << endl; }

(测试)