为了追踪内存消耗,我使用基于Linux sysinfo的简单函数来检测我的程序的每个阶段有多少可用内存(例如在我分配数组以存储数据和在阅读文件时)。这是函数的代码:
type
现在问题直截了当:奇怪的是,从文件中读取3个数组后,使用标准的C语言函数或C ++读取函数(根本不重要),并检查我们有多少可用内存在阅读之后,我看到可用内存的数量大大减少(对于下一个示例,大约通过edges_count * sizeof(int))。
#include <sys/sysinfo.h>
size_t get_free_memory_in_MB()
{
struct sysinfo info;
sysinfo(&info);
return info.freeram / (1024 * 1024);
}
所以基本上,在读完整个文件后,我根据sysinfo的内存消耗量几乎是预期的2倍。为了更好地说明问题,我提供了整个函数的代码及其输出;请阅读它,它非常小,将更好地说明问题。
fread(src_ids, sizeof(int), edges_count, graph_file);
cout << "1 test: " << get_free_memory_in_MB() << " MB" << endl;
所以,没什么复杂的。直接输出(在//之后形成一些评论)。首先,对于24GB文件:
bool load_from_edges_list_bin_file(string _file_name)
{
bool directed = true;
int vertices_count = 1;
long long int edges_count = 0;
// open the file
FILE *graph_file = fopen(_file_name.c_str(), "r");
if(graph_file == NULL)
return false;
// just reading a simple header here
fread(reinterpret_cast<char*>(&directed), sizeof(bool), 1, graph_file);
fread(reinterpret_cast<char*>(&vertices_count), sizeof(int), 1, graph_file);
fread(reinterpret_cast<char*>(&edges_count), sizeof(long long), 1, graph_file);
cout << "edges count: " << edges_count << endl;
cout << "Before graph alloc free memory: " << get_free_memory_in_MB() << " MB" << endl;
// allocate the arrays to store the result
int *src_ids = new int[edges_count];
int *dst_ids = new int[edges_count];
_TEdgeWeight *weights = new _TEdgeWeight[edges_count];
cout << "After graph alloc free memory: " << get_free_memory_in_MB() << " MB" << endl;
memset(src_ids, 0, edges_count * sizeof(int));
memset(dst_ids, 0, edges_count * sizeof(int));
memset(weights, 0, edges_count * sizeof(_TEdgeWeight));
cout << "After memset: " << get_free_memory_in_MB() << " MB" << endl;
// add edges from file
fread(src_ids, sizeof(int), edges_count, graph_file);
cout << "1 test: " << get_free_memory_in_MB() << " MB" << endl;
fread(dst_ids, sizeof(int), edges_count, graph_file);
cout << "2 test: " << get_free_memory_in_MB() << " MB" << endl;
fread(weights, sizeof(_TEdgeWeight), edges_count, graph_file);
cout << "3 test: " << get_free_memory_in_MB() << " MB" << endl;
cout << "After actual load: " << get_free_memory_in_MB() << " MB" << endl;
delete []src_ids;
delete []dst_ids;
delete []weights;
cout << "After we removed the graph load: " << get_free_memory_in_MB() << " MB" << endl;
fclose(graph_file);
cout << "After we closed the file: " << get_free_memory_in_MB() << " MB" << endl;
return true;
}
类似于48GB文件:
Loading graph...
edges count: 2147483648
Before graph alloc free memory: 91480 MB
After graph alloc free memory: 91480 MB // allocated memory here, but noting changed, why?
After memset: 66857 MB // ok, we put some data into the memory (memset) and consumed exactly 24 GB, seems correct
1 test: 57658 MB // first read and we have lost 9 GB...
2 test: 48409 MB // -9 GB again...
3 test: 39161 MB // and once more...
After actual load: 39161 MB // we lost in total 27 GB during the reads. How???
After we removed the graph load: 63783 MB // removed the arrays from memory and freed the memory we have allocated
// 24 GB freed, but 27 are still consumed somewhere
After we closed the file: 63788 MB // closing the file doesn't help
Complete!
After we quit the function: 63788 MB // quitting the function doesn't help too.
那么,我的程序中发生了什么?
1)为什么读取过程中会丢失这么多内存(使用来自C的fread和来自c ++的文件流)?
2)为什么关闭文件并不释放消耗的内存?
3)也许sysinfo向我显示不正确的信息?
4)这个问题可以与内存碎片联系起来吗?
顺便说一下,我在一个超级计算机节点上启动我的程序,我有独家访问权限(所以其他人不能影响它),哪里没有可以影响我的副应用程序程序。
感谢您阅读本文!
答案 0 :(得分:3)
这几乎可以肯定是磁盘(/页面)缓存。当您读取文件时,操作系统会将部分或全部内容存储在内存中,从而减少可用内存量。这是为了优化未来的读取。
然而,这并不意味着内存被进程使用或以其他方式不可用。如果/当需要内存时,它将被操作系统释放并可用。
您应该能够通过跟踪sysinfo结构中bufferram
参数的值(https://www.systutorials.com/docs/linux/man/2-sysinfo/),或者通过查看free -m
命令的输出来确认这一点。在运行你的程序之后。
有关此问题的更多详细信息,请参阅以下答案:https://superuser.com/questions/980820/what-is-the-difference-between-memfree-and-memavailable-in-proc-meminfo