Question

我有一个用c ++编写的代码，读取一个非常大的数据文件（10-20 go）。我读了每一行，它很长。有没有办法改善效率？

我知道有一些关于此的帖子，但我的问题并不是那么激动......

该文件包含N个原子的坐标及其在给定时间的速度。

我的代码：

void Funct(std::string path, float TimeToRead, int nbLines, float x[], float y[], float z[], float vx[], float vy[], float vz[], std::string names[], int index[])
{
    ifstream file(path.c_str());
    if (file)
    {
        /* x,y,z are arrays like float x[nbAtoms] */

        while (time != TimetoRead) {
            /*I Put the cursor at the given time to read before*/
            /*And then read atoms coordinates*/
        }

        for (int i = 0; i < nbAtoms; i++) {
            file >> x[i]; file >> y[i]; /* etc, load all*/
        }
    }
}

int main()
{
    /*Declarations : hidden*/

    for (int TimeToRead = 0; TimeToRead<finalTime; TimeToRead++) {
        Funct(...);
        /*Do some Calculus on the atoms coordinates at one given time */
    }
}

目前我有大约200万行，每行8或9列。该文件是在给定时间内原子坐标的成功。

我必须对每个时间步进行计算，所以我现在每个时间步都调用此函数（大约4000个时间步并且有大量原子）。最后是非常昂贵的时间。

我已经读过某个地方，我可以在内存中保存一行而不是每次都读取文件但是当文件是20Go时我无法将它全部保存在RAM中！

我可以做些什么来改善这种阅读？

非常感谢你

Edit1：我在Linux上

EDIT2：要读取的文件包含一个行标题，如：

time= 1
coordinates atom 1
coordinate atom 2
...
...
...
time=2
coordinates atom 1
coordinate atom 2
...
...
...
etc

while循环只是从开始读取每一行，直到找到t = TimeToRead

Answer 1

我认为有可能优化（删除）换行代码（while（time！= TimetoRead））

您在每次迭代中打开文件，然后一直跳过线条。如果您的文件包含finalTime记录，则在第一次迭代时跳过0条记录，在第二次迭代时跳过1条记录，等等。总共跳过0 + 1 + 2 + ...（finalTime-1）记录，即（finalTime-1） *（finalTime）/ 2 :-)通过每条记录的行数多次，你会看到你的大部分时间可能会丢失。

解决方案可能是：从read方法中提取文件打开操作到周围的代码。这样你就可以读取记录，然后进行微积分，然后当你读取下一条记录时，你不必再次打开文件并跳过所有这些行，因为流将自动继续在正确的位置。

在“伪代码”中应该是这样的：

void Funct(ifstream file, ...)
{
    if (file)
    {
        /* x,y,z are arrays like float x[nbAtoms] */

        for (int i = 0; i < nbAtoms; i++) {
            file >> x[i]; file >> y[i]; /* etc, load all*/
        }
    }
}

int main()
{
    ifstream file(path.c_str());

    for (int TimeToRead = 0; TimeToRead<finalTime; TimeToRead++) {
        Funct(file, ...);
        /*Do some Calculus on the atoms coordinates at one given time */
    }
}

c ++无法改善文件读取？

1 个答案: