C ++文件读取和拆分列

时间:2016-03-04 05:28:47

标签: c++ tabs newline strip file-read

我不熟悉C ++文件阅读,但我通过pyspark完成了很多工作。 所以现在我有一个txt文件,内容如下:

1   52  Hayden Smith        18:16   15  M   Berlin

2   54  Mark Puleo      18:25   15  M   Berlin

3   97  Peter Warrington    18:26   29  M   New haven

4   305 Matt Kasprzak       18:53   33  M   Falls Church

5   272 Kevin Solar     19:17   16  M   Sterling

6   394 Daniel Sullivan     19:35   26  M   Sterling

7   42  Kevan DuPont        19:58   18  M   Boylston

8   306 Chris Goethert      20:00   43  M   Falls Church

如您所见,有8列和351行(我只显示了8行), 对于每一行,[0]是排名,[1]是BIB,[2]是名字,[3]是姓,[4]是时间,[5]是年龄,[6]是性,[7]是镇 例如,第一行,1是排名,52是BIB,Hayden Smith是名字,18:16是时间,15是年龄,M是男性,柏林是城镇。

我有一个已排序的链接结构,名为:Class SortedLinked 和一个itemtype类,名为:Class Runner

您不必担心SortedLinked类。

Class Runner有四个私有属性:

string name, int age, int min, int sec

在我的驱动文件中,我可以这样做:

SortedLinked mylist                  // initialize a sorted list

Runner M("Jordan", 22, 20, 20)       // initialize a Runner called Jordan, who is 22 years old, and finished the race in 20 mins and 20 sec

mylist.add(M) //add Runner M into my sorted list

所以我需要阅读文本文件并创建一个Runner对象,其中包含跑步者的名字,年龄,分钟数和秒数。将该Runner插入已排序的链接列表。

所以,如果这是在pyspark,我可以这样做:

file=sc.textFile("hdfs")             //we usually use hdfs in pyspark

newfile = file.map(lambda line: line.split('\t')    //for each column, they are seperated by Tabs, except column[2][3] are separated by a space 

ColumnIneed = newfile.filter(lambda r: [r[2], r[3], r[4], r[5]]) // I only need the column [2][3][4][5]

mylist = ColumnIneed.collect()    // transform the RDD into a list

Then I can just transform every row into a Runner object.

但是,在C ++中我只知道这个:

ifstream, infile;

string s, sAll;

if(infile.is_open())
{

   while(getline(line, s))

   {

      s = s.rstrip('\n')     //does NOT work in C++
      name, age, time = s.split('\t')  // Does NOT work in C++ and I dont need all the columns

所以,问题:

1,我需要访问每一行,并删除换行符

2,我只需要列[2] [3] [4] [5] //每列用Tabs分隔

3,列[4]是时间,这是文本文件中的字符串,我需要拆分":"并投入一分钟和几秒钟

4,列[2] [3]是名字和姓氏,我需要将它们组合成字符串名称

5,列[2] [3]用空格分隔

理想情况下,我想这样做:

while(I need a loop)
{

   eachline = access each line;

   eachline.strip('\n')  //strip newline

   eachline.split('\t')  //split Tabs

   string name = eachline[2][3];

   string time = eachline[4];

   int min;

   int sec;

   min, sec = time.split(':")

   int age = eachline[5];

   Runner M(name, age, min, sec)    //I don't know if this works, because it looks like you are overwriting the Runner M each time you access a new line. 

   mylist.add(M)      //add M into my linkedlist, this step you don't need to worry, I already finished. 

}

如果你有更好的方法,我会非常感激。

1 个答案:

答案 0 :(得分:1)

一些代码段

    std::ifstream in;
    in.open(/*path to file*/);
    std::string line;
    if(in.is_open())
    {
        while(std::getline(in, line)) //get 1 row as a string
        {
            std::istringstream iss(line); //put line into stringstream
            std::string word;
            while(iss >> word) //read word by word
            {
                std::cout << word << std::endl;
            }
            /*
            int row;
            int age;
            std::string name;
            iss >> row >> age >> name; // adopt to your input line
            Runner M(name, age, min, sec); //common agreement - variables shouldn't start with capital, you don't override M, each time u create new local variable type of Runner, then you put copy of M into some container, M gets destroyed at the end of the block, probably you could use movement semantic, but you need C++ basics first    
            mylist.add(M);
            */
        }
    }