从具有不同列的文本文件读取到数组C ++

时间:2019-02-16 22:11:29

标签: c++

我正在尝试将以下文本文件原样读取到数组中。问题出在readData函数中,为上下文提供了其他代码。

movies.txt文件

The next 2 lines are to show the whitespace count, they are not part of the data.
000000000111111111122222222223333333333444444444455555555556666666666777777777788888888889999
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
Jan 25, 1970       MASH                                         $3,025,000        $81,600,000
Aug 5, 1983        The Star Chamber                             $8,000,000         $5,555,305
Oct 2, 1977        Julia                                        $7,840,000        $20,714,400
May 25, 1979       Alien                                       $11,000,000       $104,931,801
June 3, 1988       Big                                         $18,000,000       $151,668,774
Dec 25, 1992       Hoffa                                       $35,000,000        $29,302,121
Nov 1, 1996        Romeo + Juliet                              $14,500,000       $147,554,999
April 9, 1999      Never Been Kissed                           $25,000,000        $84,565,230
Dec 15, 1974       Young Frankenstein                           $2,780,000        $86,273,333
Dec 27, 1991       Naked Lunch                                 $18,000,000         $2,641,357
May 17, 1974       Dirty Mary Crazy Larry                       $1,140,000        $28,401,735
March 2, 1979      Norma Rae                                    $4,500,000        $22,228,000
Nov 26, 1997       Alien Resurrection                          $75,000,000       $161,295,658
Sept 23, 1970      Tora! Tora! Tora!                           $25,485,000        $29,548,291
June 21, 1991      Dying Young                                 $26,000,000        $82,264,675
June 15, 1979      Butch and Sundance: The Early Days           $9,000,000         $2,260,000

将每一列的每一行读入一个数组

=>我的代码

#include <iostream>
#include <string>
#include <fstream>
#include <iomanip>
using namespace std;

struct Movie {
    string releaseDate;
    string movieName;
    double prodCost;
    double grossProfit;
};


void readData(ifstream& in, Movie movie[], int count)
{
    string releaseDate;
    string movieName;
    double prodCost;
    double grossProfit;

    in.open("movies.txt");
// this needs to read each column in a row to an array of structs
    for (int i = 0; i < count; i++) {
        in >> releaseDate >> movieName >> prodCost >> grossProfit;
        movie[i].releaseDate = releaseDate;
        movie[i].movieName = movieName;
        movie[i].prodCost = prodCost;
        movie[i].grossProfit = grossProfit;
    }
}

int main()
{
    int size = 0;
    string dateOfRelease;
    string movieName;
    double productionCost;
    double grossProfit;

    ifstream input;

    input.open("movies.txt");

    while (input >> dateOfRelease >> movieName >> productionCost >> grossProfit) {
        size++;
    }

    input.close();
    Movie* movie = new Movie[size];
    readData(input, movie, size);
}

1 个答案:

答案 0 :(得分:1)

我对这种特定文件格式的观察是:

  • 每行正好是93个字符长
  • 标题从第20列开始
  • 生产成本在第74栏结束;数据由数字,逗号和最左边的单个$组成。
  • 毛利的格式类似于生产成本

要将行划分为字段,我将使用以下策略:

  • 确保该行正好有93个字符,否则会出错
  • 01至20个字符构成日期;需要从空白处剪裁;幸运的是,您无需将日期进一步解析为年,月,日
  • 要获取标题,请从第74列开始;只要有数字或逗号,请向左走;之后,当前字符必须是美元,否则会出错;再左转;您现在位于标题的右边缘
  • 标题从第20列开始;取子字符串(20,title_end)并修剪
  • 将子字符串(title_end + 1,74)解析为货币金额
  • 修剪substring(75,93)并将其解析为货币金额

要将字符串解析为金额:

  • 删除开头的“ $”
  • 从背面开始,检查第四个字符是否为逗号;如果是这样,请将其删除
  • 将剩余的字符串解析为双精度

整个任务听起来很复杂。这是因为文件格式不是具有内置定界符(例如XML,JSON,CSV)的标准文件格式。因此,解析它需要大量的自定义代码。

在解析此文件格式时,请注意标题不明确的标题,例如“ The $ 1,000,000 man”或很长的标题,这些标题可能会到达第67列。您的示例未包含这些极端示例,但这本身并不意味着这样的例子不存在。