我已经阅读了有关此参数的文档,但区别确实很大!启用后,一个简单程序的内存使用量(请参阅下文)约为 7 GB ;禁用该程序后,报告的使用量约为 160 KB 。
top
还显示了大约7 GB,这可以用pages-as-heap=yes
确认。
(我有一个理论,但我不认为这可以解释如此巨大的差异,所以-寻求帮助)。
令我特别困扰的是,std::string
使用了大多数报告的内存使用情况,而从未打印过what?
(意味着-实际容量很小)。
在对我的应用进行性能分析时,我确实需要使用pages-as-heap=yes
,我只是想知道如何避免出现“误报”
代码段:
#include <iostream>
#include <thread>
#include <vector>
#include <chrono>
void run()
{
while (true)
{
std::string s;
s += "aaaaa";
s += "aaaaaaaaaaaaaaa";
s += "bbbbbbbbbb";
s += "cccccccccccccccccccccccccccccccccccccccccccccccccc";
if (s.capacity() > 1024) std::cout << "what?" << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(1));
}
}
int main()
{
std::vector<std::thread> workers;
for( unsigned i = 0; i < 192; ++i ) workers.push_back(std::thread(&run));
workers.back().join();
}
编译为:g++ --std=c++11 -fno-inline -g3 -pthread
使用pages-as-heap=yes
:
100.00% (7,257,714,688B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->99.75% (7,239,757,824B) 0x54E0679: mmap (mmap.c:34)
| ->53.63% (3,892,314,112B) 0x545C3CF: new_heap (arena.c:438)
| | ->53.63% (3,892,314,112B) 0x545CC1F: arena_get2.part.3 (arena.c:646)
| | ->53.63% (3,892,314,112B) 0x5463248: malloc (malloc.c:2911)
| | ->53.63% (3,892,314,112B) 0x4CB7E76: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->53.63% (3,892,314,112B) 0x4CF8E37: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->53.63% (3,892,314,112B) 0x4CF9C69: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->53.63% (3,892,314,112B) 0x4CF9D22: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->53.63% (3,892,314,112B) 0x4CF9FB1: std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->53.63% (3,892,314,112B) 0x401252: run() (test.cpp:11)
| | ->53.63% (3,892,314,112B) 0x403929: void std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) (functional:1700)
| | ->53.63% (3,892,314,112B) 0x403864: std::_Bind_simple<void (*())()>::operator()() (functional:1688)
| | ->53.63% (3,892,314,112B) 0x4037D2: std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (thread:115)
| | ->53.63% (3,892,314,112B) 0x4CE2C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->53.63% (3,892,314,112B) 0x51C96B8: start_thread (pthread_create.c:333)
| | ->53.63% (3,892,314,112B) 0x54E63DB: clone (clone.S:109)
| |
| ->35.14% (2,550,136,832B) 0x545C35B: new_heap (arena.c:427)
| | ->35.14% (2,550,136,832B) 0x545CC1F: arena_get2.part.3 (arena.c:646)
| | ->35.14% (2,550,136,832B) 0x5463248: malloc (malloc.c:2911)
| | ->35.14% (2,550,136,832B) 0x4CB7E76: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->35.14% (2,550,136,832B) 0x4CF8E37: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->35.14% (2,550,136,832B) 0x4CF9C69: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->35.14% (2,550,136,832B) 0x4CF9D22: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->35.14% (2,550,136,832B) 0x4CF9FB1: std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->35.14% (2,550,136,832B) 0x401252: run() (test.cpp:11)
| | ->35.14% (2,550,136,832B) 0x403929: void std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) (functional:1700)
| | ->35.14% (2,550,136,832B) 0x403864: std::_Bind_simple<void (*())()>::operator()() (functional:1688)
| | ->35.14% (2,550,136,832B) 0x4037D2: std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (thread:115)
| | ->35.14% (2,550,136,832B) 0x4CE2C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->35.14% (2,550,136,832B) 0x51C96B8: start_thread (pthread_create.c:333)
| | ->35.14% (2,550,136,832B) 0x54E63DB: clone (clone.S:109)
| |
| ->10.99% (797,306,880B) 0x51CA1D4: pthread_create@@GLIBC_2.2.5 (allocatestack.c:513)
| ->10.99% (797,306,880B) 0x4CE2DC1: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->10.99% (797,306,880B) 0x4CE2ECB: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->10.99% (797,306,880B) 0x401BEA: std::thread::thread<void (*)()>(void (*&&)()) (thread:138)
| ->10.99% (797,306,880B) 0x401353: main (test.cpp:24)
|
->00.25% (17,956,864B) in 1+ places, all below ms_print's threshold (01.00%)
与pages-as-heap=no
一起使用:
96.38% (159,289B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->43.99% (72,704B) 0x4EBAEFE: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->43.99% (72,704B) 0x40106B8: call_init.part.0 (dl-init.c:72)
| ->43.99% (72,704B) 0x40107C9: _dl_init (dl-init.c:30)
| ->43.99% (72,704B) 0x4000C68: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so)
|
->33.46% (55,296B) 0x40138A3: _dl_allocate_tls (dl-tls.c:322)
| ->33.46% (55,296B) 0x53D126D: pthread_create@@GLIBC_2.2.5 (allocatestack.c:588)
| ->33.46% (55,296B) 0x4EE9DC1: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->33.46% (55,296B) 0x4EE9ECB: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->33.46% (55,296B) 0x401BEA: std::thread::thread<void (*)()>(void (*&&)()) (thread:138)
| ->33.46% (55,296B) 0x401353: main (test.cpp:24)
|
->12.12% (20,025B) 0x4EFFE37: std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->12.12% (20,025B) 0x4F00C69: std::string::_Rep::_M_clone(std::allocator<char> const&, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->12.12% (20,025B) 0x4F00D22: std::string::reserve(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->12.12% (20,025B) 0x4F00FB1: std::string::append(char const*, unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->12.07% (19,950B) 0x401285: run() (test.cpp:14)
| | ->12.07% (19,950B) 0x403929: void std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) (functional:1700)
| | ->12.07% (19,950B) 0x403864: std::_Bind_simple<void (*())()>::operator()() (functional:1688)
| | ->12.07% (19,950B) 0x4037D2: std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() (thread:115)
| | ->12.07% (19,950B) 0x4EE9C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->12.07% (19,950B) 0x53D06B8: start_thread (pthread_create.c:333)
| | ->12.07% (19,950B) 0x56ED3DB: clone (clone.S:109)
| |
| ->00.05% (75B) in 1+ places, all below ms_print's threshold (01.00%)
|
->05.58% (9,216B) 0x40315B: __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, (__gnu_cxx::_Lock_policy)2> >::allocate(unsigned long, void const*) (new_allocator.h:104)
| ->05.58% (9,216B) 0x402FC2: std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, (__gnu_cxx::_Lock_policy)2> > >::allocate(std::allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, (__gnu_cxx::_Lock_policy)2> >&, unsigned long) (alloc_traits.h:488)
| ->05.58% (9,216B) 0x402D4B: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> >(std::_Sp_make_shared_tag, std::thread::_Impl<std::_Bind_simple<void (*())()> >*, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > > const&, std::_Bind_simple<void (*())()>&&) (shared_ptr_base.h:616)
| ->05.58% (9,216B) 0x402BDE: std::__shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> >, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> >(std::_Sp_make_shared_tag, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > > const&, std::_Bind_simple<void (*())()>&&) (shared_ptr_base.h:1090)
| ->05.58% (9,216B) 0x402A76: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> > >::shared_ptr<std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> >(std::_Sp_make_shared_tag, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > > const&, std::_Bind_simple<void (*())()>&&) (shared_ptr.h:316)
| ->05.58% (9,216B) 0x402771: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> > > std::allocate_shared<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > >, std::_Bind_simple<void (*())()> >(std::allocator<std::thread::_Impl<std::_Bind_simple<void (*())()> > > const&, std::_Bind_simple<void (*())()>&&) (shared_ptr.h:594)
| ->05.58% (9,216B) 0x402325: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> > > std::make_shared<std::thread::_Impl<std::_Bind_simple<void (*())()> >, std::_Bind_simple<void (*())()> >(std::_Bind_simple<void (*())()>&&) (shared_ptr.h:610)
| ->05.58% (9,216B) 0x401F9C: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<void (*())()> > > std::thread::_M_make_routine<std::_Bind_simple<void (*())()> >(std::_Bind_simple<void (*())()>&&) (thread:196)
| ->05.58% (9,216B) 0x401BC4: std::thread::thread<void (*)()>(void (*&&)()) (thread:138)
| ->05.58% (9,216B) 0x401353: main (test.cpp:24)
|
->01.24% (2,048B) 0x402C9A: __gnu_cxx::new_allocator<std::thread>::allocate(unsigned long, void const*) (new_allocator.h:104)
->01.24% (2,048B) 0x402AF5: std::allocator_traits<std::allocator<std::thread> >::allocate(std::allocator<std::thread>&, unsigned long) (alloc_traits.h:488)
->01.24% (2,048B) 0x402928: std::_Vector_base<std::thread, std::allocator<std::thread> >::_M_allocate(unsigned long) (stl_vector.h:170)
->01.24% (2,048B) 0x40244E: void std::vector<std::thread, std::allocator<std::thread> >::_M_emplace_back_aux<std::thread>(std::thread&&) (vector.tcc:412)
->01.24% (2,048B) 0x40206D: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<std::thread>(std::thread&&) (vector.tcc:101)
->01.24% (2,048B) 0x401C82: std::vector<std::thread, std::allocator<std::thread> >::push_back(std::thread&&) (stl_vector.h:932)
->01.24% (2,048B) 0x401366: main (test.cpp:24)
请忽略对线程的糟糕处理,这只是一个非常简短的示例。
这似乎与std::string
根本无关。正如@Lawrence所建议的,可以通过在堆上简单地分配一个int
(使用new
)来重现这一点。我相信@Lawrence引用了他的评论(非常便于其他读者阅读),非常接近此处的真实答案:
劳伦斯:
@KirilKirov字符串分配实际上并不需要那么多 空间...每个线程获取其初始堆栈,然后堆访问映射 一些不正确的空间(大约70m) 反映出来。您可以通过声明1个字符串然后测量 具有自旋循环...显示了相同的虚拟内存使用量– 劳伦斯9月28日14:51
我:
@Lawrence-你该死的对!好,所以,你说的是 这样),在每个线程上,在第一个堆分配上, 内存管理器(或操作系统,或其他)专用于 线程堆所需的内存?并且此块将被重用 稍后(或在必要时缩小)? – Kiril Kirov 9月28日15:45
劳伦斯:
@KirilKirov具有这种性质...确切的分配可能取决于malloc的实现,而莫非如此-Lawrence 2天前
答案 0 :(得分:3)
massif
的 --pages-as-heap=yes
和您正在观察的top
列均用于衡量进程使用的虚拟内存。该虚拟内存包括在malloc的实现中和线程创建期间的所有mmap
'空间。例如,线程的默认堆栈大小将为8192k
,这反映在每个线程的创建中,并有助于虚拟内存占用。
具体的分配方案将取决于实现,但是新线程上的第一个堆分配似乎mmap
大约有65兆字节的空间。可以通过查看流程的pmap输出来查看。
与示例非常相似的程序的摘录:
75170: ./a.out
0000000000400000 24K r-x-- a.out
0000000000605000 4K r---- a.out
0000000000606000 4K rw--- a.out
0000000001b6a000 200K rw--- [ anon ]
00007f669dfa4000 4K ----- [ anon ]
00007f669dfa5000 8192K rw--- [ anon ]
00007f669e7a5000 4K ----- [ anon ]
00007f669e7a6000 8192K rw--- [ anon ]
00007f669efa6000 4K ----- [ anon ]
00007f669efa7000 8192K rw--- [ anon ]
...
00007f66cb800000 8192K rw--- [ anon ]
00007f66cc000000 132K rw--- [ anon ]
00007f66cc021000 65404K ----- [ anon ]
00007f66d0000000 132K rw--- [ anon ]
00007f66d0021000 65404K ----- [ anon ]
00007f66d4000000 132K rw--- [ anon ]
00007f66d4021000 65404K ----- [ anon ]
...
00007f6880586000 8192K rw--- [ anon ]
00007f6880d86000 1056K r-x-- libm-2.23.so
00007f6880e8e000 2044K ----- libm-2.23.so
...
00007f6881c08000 4K r---- libpthread-2.23.so
00007f6881c09000 4K rw--- libpthread-2.23.so
00007f6881c0a000 16K rw--- [ anon ]
00007f6881c0e000 152K r-x-- ld-2.23.so
00007f6881e09000 24K rw--- [ anon ]
00007f6881e33000 4K r---- ld-2.23.so
00007f6881e34000 4K rw--- ld-2.23.so
00007f6881e35000 4K rw--- [ anon ]
00007ffe9d75b000 132K rw--- [ stack ]
00007ffe9d7f8000 12K r---- [ anon ]
00007ffe9d7fb000 8K r-x-- [ anon ]
ffffffffff600000 4K r-x-- [ anon ]
total 7815008K
当您接近每个进程的虚拟内存阈值时,malloc似乎变得更加保守。另外,我对分别映射库的评论是错误的(应该在每个进程中共享)
答案 1 :(得分:1)
在尝试弄清正在发生的事情的同时,我将尝试对所学内容进行简短总结。
注意:这个感谢@Lawrence才有可能-感激!
这与Linux /内核(虚拟)内存管理或std::string
绝对无关。
这与glibc
的内存分配器有关-它只是在第一个(当然,不仅是)动态分配(每个线程)上分配了巨大的内存区域。。
#include <thread>
#include <vector>
#include <chrono>
int main() {
std::vector<std::thread> workers;
for( unsigned i = 0; i < 192; ++i )
workers.emplace_back([]{
const auto x = std::make_unique<int>(rand());
while (true) std::this_thread::sleep_for(std::chrono::seconds(1));});
workers.back().join();
}
请忽略线程的糟糕处理,我希望它尽可能短。
编译:{{1}}。
个人资料:g++ --std=c++14 -fno-inline -g3 -O0 -pthread test.cpp
valgrind --tool=massif --pages-as-heap=[no|yes] ./a.out
显示top
KiB虚拟内存。
7'815'012
还显示pmap
KiB虚拟内存。
7'815'016
和massif
显示了类似的结果:pages-as-heap=yes
KiB,请参见下文。
另一方面,7'817'088
与massif
截然不同-大约133 KiB!
在终止程序之前使用内存:
pages-as-heap=no
在终止程序之前使用内存:
100.00% (8,004,698,112B) (page allocation syscalls) mmap/mremap/brk, --alloc-fns, etc.
->99.78% (7,986,741,248B) 0x54E0679: mmap (mmap.c:34)
| ->46.11% (3,690,987,520B) 0x545C3CF: new_heap (arena.c:438)
| | ->46.11% (3,690,987,520B) 0x545CC1F: arena_get2.part.3 (arena.c:646)
| | ->46.11% (3,690,987,520B) 0x5463248: malloc (malloc.c:2911)
| | ->46.11% (3,690,987,520B) 0x4CB7E76: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->46.11% (3,690,987,520B) 0x4026D0: std::_MakeUniq<int>::__single_object std::make_unique<int, int>(int&&) (unique_ptr.h:765)
| | ->46.11% (3,690,987,520B) 0x400EC5: main::{lambda()
| | ->46.11% (3,690,987,520B) 0x40225C: void std::_Bind_simple<main::{lambda()
| | ->46.11% (3,690,987,520B) 0x402194: std::_Bind_simple<main::{lambda()
| | ->46.11% (3,690,987,520B) 0x402102: std::thread::_Impl<std::_Bind_simple<main::{lambda()
| | ->46.11% (3,690,987,520B) 0x4CE2C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->46.11% (3,690,987,520B) 0x51C96B8: start_thread (pthread_create.c:333)
| | ->46.11% (3,690,987,520B) 0x54E63DB: clone (clone.S:109)
| |
| ->33.53% (2,684,354,560B) 0x545C35B: new_heap (arena.c:427)
| | ->33.53% (2,684,354,560B) 0x545CC1F: arena_get2.part.3 (arena.c:646)
| | ->33.53% (2,684,354,560B) 0x5463248: malloc (malloc.c:2911)
| | ->33.53% (2,684,354,560B) 0x4CB7E76: operator new(unsigned long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->33.53% (2,684,354,560B) 0x4026D0: std::_MakeUniq<int>::__single_object std::make_unique<int, int>(int&&) (unique_ptr.h:765)
| | ->33.53% (2,684,354,560B) 0x400EC5: main::{lambda()
| | ->33.53% (2,684,354,560B) 0x40225C: void std::_Bind_simple<main::{lambda()
| | ->33.53% (2,684,354,560B) 0x402194: std::_Bind_simple<main::{lambda()
| | ->33.53% (2,684,354,560B) 0x402102: std::thread::_Impl<std::_Bind_simple<main::{lambda()
| | ->33.53% (2,684,354,560B) 0x4CE2C7E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| | ->33.53% (2,684,354,560B) 0x51C96B8: start_thread (pthread_create.c:333)
| | ->33.53% (2,684,354,560B) 0x54E63DB: clone (clone.S:109)
| |
| ->20.13% (1,611,399,168B) 0x51CA1D4: pthread_create@@GLIBC_2.2.5 (allocatestack.c:513)
| ->20.13% (1,611,399,168B) 0x4CE2DC1: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->20.13% (1,611,399,168B) 0x4CE2ECB: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->20.13% (1,611,399,168B) 0x40139A: std::thread::thread<main::{lambda()
| ->20.13% (1,611,399,168B) 0x4012AE: _ZN9__gnu_cxx13new_allocatorISt6threadE9constructIS1_IZ4mainEUlvE_EEEvPT_DpOT0_ (new_allocator.h:120)
| ->20.13% (1,611,399,168B) 0x401075: _ZNSt16allocator_traitsISaISt6threadEE9constructIS0_IZ4mainEUlvE_EEEvRS1_PT_DpOT0_ (alloc_traits.h:527)
| ->19.19% (1,535,864,832B) 0x401009: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
| | ->19.19% (1,535,864,832B) 0x400F47: main (test.cpp:10)
| |
| ->00.94% (75,534,336B) in 1+ places, all below ms_print's threshold (01.00%)
|
->00.22% (17,956,864B) in 1+ places, all below ms_print's threshold (01.00%)
使用--------------------------------------------------------------------------------
n time(i) total(B) useful-heap(B) extra-heap(B) stacks(B)
--------------------------------------------------------------------------------
68 2,793,125 143,280 136,676 6,604 0
95.39% (136,676B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc.
->50.74% (72,704B) 0x4EBAEFE: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->50.74% (72,704B) 0x40106B8: call_init.part.0 (dl-init.c:72)
| ->50.74% (72,704B) 0x40107C9: _dl_init (dl-init.c:30)
| ->50.74% (72,704B) 0x4000C68: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so)
|
->36.58% (52,416B) 0x40138A3: _dl_allocate_tls (dl-tls.c:322)
| ->36.58% (52,416B) 0x53D126D: pthread_create@@GLIBC_2.2.5 (allocatestack.c:588)
| ->36.58% (52,416B) 0x4EE9DC1: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>, void (*)()) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->36.58% (52,416B) 0x4EE9ECB: std::thread::_M_start_thread(std::shared_ptr<std::thread::_Impl_base>) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
| ->36.58% (52,416B) 0x40139A: std::thread::thread<main::{lambda()
| ->36.58% (52,416B) 0x4012AE: _ZN9__gnu_cxx13new_allocatorISt6threadE9constructIS1_IZ4mainEUlvE_EEEvPT_DpOT0_ (new_allocator.h:120)
| ->36.58% (52,416B) 0x401075: _ZNSt16allocator_traitsISaISt6threadEE9constructIS0_IZ4mainEUlvE_EEEvRS1_PT_DpOT0_ (alloc_traits.h:527)
| ->34.77% (49,824B) 0x401009: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
| | ->34.77% (49,824B) 0x400F47: main (test.cpp:10)
| |
| ->01.81% (2,592B) 0x4010FF: void std::vector<std::thread, std::allocator<std::thread> >::_M_emplace_back_aux<main::{lambda()
| ->01.81% (2,592B) 0x40103D: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
| ->01.81% (2,592B) 0x400F47: main (test.cpp:10)
|
->06.13% (8,784B) 0x401B4B: __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x401A60: std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x40194D: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x401894: std::__shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x40183A: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x4017C7: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x4016AB: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x40155E: std::shared_ptr<std::thread::_Impl<std::_Bind_simple<main::{lambda()
| ->06.13% (8,784B) 0x401374: std::thread::thread<main::{lambda()
| ->06.13% (8,784B) 0x4012AE: _ZN9__gnu_cxx13new_allocatorISt6threadE9constructIS1_IZ4mainEUlvE_EEEvPT_DpOT0_ (new_allocator.h:120)
| ->06.13% (8,784B) 0x401075: _ZNSt16allocator_traitsISaISt6threadEE9constructIS0_IZ4mainEUlvE_EEEvRS1_PT_DpOT0_ (alloc_traits.h:527)
| ->05.83% (8,352B) 0x401009: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
| | ->05.83% (8,352B) 0x400F47: main (test.cpp:10)
| |
| ->00.30% (432B) in 1+ places, all below ms_print's threshold (01.00%)
|
->01.43% (2,048B) 0x403432: __gnu_cxx::new_allocator<std::thread>::allocate(unsigned long, void const*) (new_allocator.h:104)
| ->01.43% (2,048B) 0x4032CF: std::allocator_traits<std::allocator<std::thread> >::allocate(std::allocator<std::thread>&, unsigned long) (alloc_traits.h:488)
| ->01.43% (2,048B) 0x4030B8: std::_Vector_base<std::thread, std::allocator<std::thread> >::_M_allocate(unsigned long) (stl_vector.h:170)
| ->01.43% (2,048B) 0x4010B6: void std::vector<std::thread, std::allocator<std::thread> >::_M_emplace_back_aux<main::{lambda()
| ->01.43% (2,048B) 0x40103D: void std::vector<std::thread, std::allocator<std::thread> >::emplace_back<main::{lambda()
| ->01.43% (2,048B) 0x400F47: main (test.cpp:10)
|
->00.51% (724B) in 1+ places, all below ms_print's threshold (01.00%)
看起来很合理-我们不要检查它。正如预期的那样,一切都以pages-as-heap=no
结尾,并且内存使用量足够小,不必担心我们-这些是高级分配。
但是看到malloc/new/new[]
吗?如此简单的代码就能实现〜8GiB虚拟内存?
让我们检查堆栈跟踪。
pages-as-heap=yes
让我们从一个简单的词开始:以pthread_create
结尾的那个词。
pthread_create
报告massif
个已分配内存的字节-恰好是192 * 8'196 KiB,这意味着-192个线程* 8MiB,which is the default max stack size of a thread in Linux。
注意,8'196 KiB并不完全是8 MiB(8'192 KiB)。我不知道这种差异来自何处,但目前并不重要。
1,611,399,168
好的,现在让我们看看其他两个堆栈...等一下,它们是完全一样的吗?是的,std::make_unique<int>
的文档对此进行了解释,虽然我并不完全理解它,但是它也不重要。它们显示完全相同的堆栈。让我们合并结果并将其一起检查。
这两个堆栈合并使用的内存为massif
个字节,它们都是由我们简单的6'375'342'080
引起的!
让我们退后一步:如果我们运行相同的实验,但是使用一个简单的线程,我们将看到,这种std::make_unique<int>
分配会导致分配int
个内存字节,恰好是64 MB 。会发生什么事?
这全都取决于67'108'864
的实现(众所周知,malloc
是内部由new/new[]
..默认实现的。)
malloc
内部使用称为malloc
的内存分配器-Linux中的默认内存分配器,它支持线程。
简单地说,此分配器处理以下术语:
ptmalloc2
:巨大的存储空间;出于性能原因,通常每个线程;并不是所有的软件线程都有自己的 per-thread-arenas ,这通常取决于硬件线程的数量(我想还有其他细节); per thread arena
:heap
被分成堆; arena
:chunks
被分为较小的内存区域,称为heap
。关于这些事情的细节很多,稍后会发布一些有趣的链接,尽管这足以让读者自己进行研究-这些实际上是底层的,与C ++内存有关管理。
那么,让我们回到一个线程的测试中-为单个chunks
分配64 MiB?让我们再次查看堆栈跟踪并集中在其末尾:
int
惊喜,令人惊讶:mmap (mmap.c:34)
new_heap (arena.c:438)
arena_get2.part.3 (arena.c:646)
malloc (malloc.c:2911)
呼叫malloc
,这呼叫arena_get2
,这导致我们进入new_heap
(mmap
和mmap
是低级系统调用,用于Linux中的内存分配)。据报道,这恰好分配了64个MiB内存。
好的,现在让我们回到带有192个线程和巨大数量brk
的原始示例-这完全是 95 * 64 MiB!
为什么恰好是95-我不能说真的,我停止挖掘了,但是事实是,这个大数可被64 MiB整除对我来说已经足够了。
如有必要,您可以进行更深入的研究。
非常酷的说明性文章:Understanding glibc malloc,作者:sploitfun
更正式的文档:The GNU allocator
一个很酷的堆栈交换问题:How does glibc malloc works
其他:
如果在阅读本文时这些链接中的某些链接断开了,那么查找相似的文章应该相当容易。如果您知道寻找的内容和方式,则该主题非常受欢迎。
我希望这些观察结果能对整个图片有一个高层次的描述,并为进一步的研究提供足够的食物。
随时发表评论/(建议)编辑/更正/扩展/等等。
答案 2 :(得分:0)
这只是“某种”答案(从Valgrind的角度来看)。内存池的问题,尤其是C ++字符串,已经有一段时间了。 Valgrind manual的一节介绍了C ++字符串中的泄漏,建议您尝试设置GLIBCXX_FORCE_NEW环境变量。
此外,对于GCC6和更高版本,Valgrind添加了挂钩以清理libstdc ++中仍可访问的内存。 Valgrind bugzilla条目为here,GCC条目为here。
我不明白为什么这么小的分配会膨胀到这么多的千兆字节(对于64位可执行文件,CentOS 6.6,GCC 6.2,超过12 GB)。
答案 3 :(得分:0)
查看文档:
-pages-as-heap = [默认:否] 告诉Massif在页面级别而不是在malloc的块级别配置内存。有关详情,请参见上文。
因此,根据文档,更改此设置将更改测量的值。不是分配的。
如果是,则您正在测量页数。 如果为否,则说明正在测量malloc块。