基于Linux sysinfo数据的C fread / C ++读取函数的奇怪内存消耗

时间:2018-05-19 06:41:56

标签: c++ linux file memory

好吧,我有一个奇怪的(在我看来)我的程序的行为,现在简化为从非常大(大约24GB和48 GB)二进制文件中读取3个数组。这些文件的结构非常简单,它们包含一个小标题,以及3个数组:int,int和float类型,所有3个大小为N,其中N非常大:28 GB文件为2147483648,48 GB为4294967296之一。

为了追踪内存消耗,我使用基于Linux sysinfo的简单函数来检测我的程序的每个阶段有多少可用内存(例如在我分配数组以存储数据和在阅读文件时)。这是函数的代码:

type

现在问题直截了当:奇怪的是,从文件中读取3个数组后,使用标准的C语言函数或C ++读取函数(根本不重要),并检查我们有多少可用内存在阅读之后,我看到可用内存的数量大大减少(对于下一个示例,大约通过edges_count * sizeof(int))。

#include <sys/sysinfo.h>
size_t get_free_memory_in_MB()
{
    struct sysinfo info;
    sysinfo(&info);
    return info.freeram / (1024 * 1024);
}

所以基本上,在读完整个文件后,我根据sysinfo的内存消耗量几乎是预期的2倍。为了更好地说明问题,我提供了整个函数的代码及其输出;请阅读它,它非常小,将更好地说明问题。

fread(src_ids, sizeof(int), edges_count, graph_file);
cout << "1 test: " << get_free_memory_in_MB() << " MB" << endl;

所以,没什么复杂的。直接输出(在//之后形成一些评论)。首先,对于24GB文件:

bool load_from_edges_list_bin_file(string _file_name)
{
    bool directed = true;
    int vertices_count = 1;
    long long int edges_count = 0;

    // open the file
    FILE *graph_file = fopen(_file_name.c_str(), "r");
    if(graph_file == NULL)
        return false;

    // just reading a simple header here
    fread(reinterpret_cast<char*>(&directed), sizeof(bool), 1, graph_file);
    fread(reinterpret_cast<char*>(&vertices_count), sizeof(int), 1, graph_file);
    fread(reinterpret_cast<char*>(&edges_count), sizeof(long long), 1, graph_file);

    cout << "edges count: " << edges_count << endl;
    cout << "Before graph alloc free memory: " << get_free_memory_in_MB() << " MB" << endl;

    // allocate the arrays to store the result
    int *src_ids = new int[edges_count];
    int *dst_ids = new int[edges_count];
    _TEdgeWeight *weights = new _TEdgeWeight[edges_count];

    cout << "After graph alloc free memory: " << get_free_memory_in_MB() << " MB" << endl;

    memset(src_ids, 0, edges_count * sizeof(int));
    memset(dst_ids, 0, edges_count * sizeof(int));
    memset(weights, 0, edges_count * sizeof(_TEdgeWeight));

    cout << "After memset: " << get_free_memory_in_MB() << " MB" << endl;

    // add edges from file
    fread(src_ids, sizeof(int), edges_count, graph_file);
    cout << "1 test: " << get_free_memory_in_MB() << " MB" << endl;

    fread(dst_ids, sizeof(int), edges_count, graph_file);
    cout << "2 test: " << get_free_memory_in_MB() << " MB" << endl;

    fread(weights, sizeof(_TEdgeWeight), edges_count, graph_file);
    cout << "3 test: " << get_free_memory_in_MB() << " MB" << endl;

    cout << "After actual load: " << get_free_memory_in_MB() << " MB" << endl;

    delete []src_ids;
    delete []dst_ids;
    delete []weights;

    cout << "After we removed the graph load: " << get_free_memory_in_MB() << " MB" << endl;

    fclose(graph_file);

    cout << "After we closed the file: " << get_free_memory_in_MB() << " MB" << endl;

    return true;
}

类似于48GB文件:

Loading graph...
edges count: 2147483648
Before graph alloc free memory: 91480 MB 
After graph alloc free memory: 91480 MB // allocated memory here, but noting changed, why?
After memset: 66857 MB // ok, we put some data into the memory (memset) and consumed exactly 24 GB, seems correct
1 test: 57658 MB // first read and we have lost 9 GB...
2 test: 48409 MB // -9 GB again...
3 test: 39161 MB // and once more...
After actual load: 39161 MB // we lost in total 27 GB during the reads. How???
After we removed the graph load: 63783 MB // removed the arrays from memory and freed the memory we have allocated
// 24 GB freed, but 27 are still consumed somewhere
After we closed the file: 63788 MB // closing the file doesn't help
Complete!
After we quit the function: 63788 MB // quitting the function doesn't help too.

那么,我的程序中发生了什么?

1)为什么读取过程中会丢失这么多内存(使用来自C的fread和来自c ++的文件流)?

2)为什么关闭文件并不释放消耗的内存?

3)也许sysinfo向我显示不正确的信息?

4)这个问题可以与内存碎片联系起来吗?

顺便说一下,我在一个超级计算机节点上启动我的程序,我有独家访问权限(所以其他人不能影响它),哪里没有可以影响我的副应用程序程序。

感谢您阅读本文!

1 个答案:

答案 0 :(得分:3)

这几乎可以肯定是磁盘(/页面)缓存。当您读取文件时,操作系统会将部分或全部内容存储在内存中,从而减少可用内存量。这是为了优化未来的读取。

然而,这并不意味着内存被进程使用或以其他方式不可用。如果/当需要内存时,它将被操作系统释放并可用。

您应该能够通过跟踪sysinfo结构中bufferram参数的值(https://www.systutorials.com/docs/linux/man/2-sysinfo/),或者通过查看free -m命令的输出来确认这一点。在运行你的程序之后。

有关此问题的更多详细信息,请参阅以下答案:https://superuser.com/questions/980820/what-is-the-difference-between-memfree-and-memavailable-in-proc-meminfo