malloc_trim(0)发布线程竞技场的Fastbins?

时间:2016-11-09 18:32:54

标签: c++ multithreading malloc glibc

在过去一周左右的时间里,我一直在研究应用程序中的问题,即内存使用量随时间累积。我将它缩小到复制

的行

std::vector< std::vector< std::vector< std::map< uint, map< uint, std::bitset< N> > > > > >

在一个工作线程中(我意识到这是一种组织内存的荒谬方式)。工作线程会定期被销毁,重新创建,并且线程在启动时会复制该内存结构。复制的原始数据将通过主线程的引用传递给工作线程。

使用malloc_stat和malloc_info,我可以看到当工作线程被销毁时,它正在使用的竞技场/堆在其空闲的fastbins列表中保留了用于该结构的内存。这是有道理的,因为有许多单个分配少于64个字节。

问题是,当重新创建工作线程时,它会创建一个新的竞技场/堆而不是重用前一个竞技场/堆,这样以前的竞技场/堆的快速重复从不被重用。最终系统耗尽内存,然后重新使用前一个堆/竞技场重用它们所持有的快速线。

有点意外,我发现在加入工作线程后,在我的主线程中调用malloc_trim(0)会导致线程竞技场/堆中的fastbins被释放。就我所见,这种行为没有记录。有人有解释吗?

以下是我用来查看此行为的一些测试代码:

// includes
#include <stdio.h>
#include <algorithm>
#include <vector>
#include <iostream>
#include <stdexcept>
#include <stdio.h>
#include <string>
#include <mcheck.h>
#include <malloc.h>
#include <map>
#include <bitset>
#include <boost/thread.hpp>
#include <boost/shared_ptr.hpp>

// Number of bits per bitset.
const int sizeOfBitsets = 40;

// Executes a system command. Used to get output of "free -m".
std::string ExecuteSystemCommand(const char* cmd) {
    char buffer[128];
    std::string result = "";
    FILE* pipe = popen(cmd, "r");
    if (!pipe) throw std::runtime_error("popen() failed!");
    try {
        while (!feof(pipe)) {
            if (fgets(buffer, 128, pipe) != NULL)
                result += buffer;
        }
    } catch (...) {
        pclose(pipe);
        throw;
    }
    pclose(pipe);
    return result;
}

// Prints output of "free -m" and output of malloc_stat().
void PrintMemoryStats()
{
    try
    {
        char *buf;
        size_t size;
        FILE *fp;

        std::string myCommand("free -m");
        std::string result = ExecuteSystemCommand(myCommand.c_str());
        printf("Free memory is \n%s\n", result.c_str());

        malloc_stats();

        fp = open_memstream(&buf, &size);
        malloc_info(0, fp);
        fclose(fp);
        printf("# Memory Allocation Stats\n%s\n#> ", buf);
        free(buf);

    }
    catch(...)
    {
        printf("Unable to print memory stats.\n");
        throw;
    }
}

void MakeCopies(std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > >& data)
{
    try
    {
        // Create copies.
        std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyA(data);
        std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyB(data);
        std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > dataCopyC(data);

        // Print memory info.
        printf("Memory after creating data copies:\n");
        PrintMemoryStats();
    }
    catch(...)
    {
        printf("Unable to make copies.");
        throw;
    }
}

int main(int argc, char** argv)
{
    try
    {
          // When uncommented, disables the use of fastbins.
//        mallopt(M_MXFAST, 0);

        // Print memory info.
        printf("Memory to start is:\n");
        PrintMemoryStats();

        // Sizes of original data.
        int sizeOfDataA = 2048;
        int sizeOfDataB = 4;
        int sizeOfDataC = 128;
        int sizeOfDataD = 20;
        std::vector<std::vector<std::map<uint, std::map<uint, std::bitset<sizeOfBitsets> > > > > testData;

        // Populate data.
        testData.resize(sizeOfDataA);
        for(int a = 0; a < sizeOfDataA; ++a)
        {
            testData.at(a).resize(sizeOfDataB);
            for(int b = 0; b < sizeOfDataB; ++b)
            {
                for(int c = 0; c < sizeOfDataC; ++c)
                {
                    std::map<uint, std::bitset<sizeOfBitsets> > dataMap;
                    testData.at(a).at(b).insert(std::pair<uint, std::map<uint, std::bitset<sizeOfBitsets> > >(c, dataMap));
                    for(int d = 0; d < sizeOfDataD; ++d)
                    {
                        std::bitset<sizeOfBitsets> testBitset;
                        testData.at(a).at(b).at(c).insert(std::pair<uint, std::bitset<sizeOfBitsets> >(d, testBitset));
                    }
                }
            }
        }

        // Print memory info.
        printf("Memory to after creating original data is:\n");
        PrintMemoryStats();

        // Start thread to make copies and wait to join.
        {
            boost::shared_ptr<boost::thread> makeCopiesThread = boost::shared_ptr<boost::thread>(new boost::thread(&MakeCopies, boost::ref(testData)));
            makeCopiesThread->join();
        }

        // Print memory info.
        printf("Memory to after joining thread is:\n");
        PrintMemoryStats();

        malloc_trim(0);

        // Print memory info.
        printf("Memory to after malloc_trim(0) is:\n");
        PrintMemoryStats();

        return 0;

    }
    catch(...)
    {
        // Log warning.
        printf("Unable to run application.");

        // Return failure.
        return 1;
    }

    // Return success.
    return 0;
}

来自malloc trim调用之前和之后的有趣输出是(寻找“LOOK HERE!”):

#> Memory to after joining thread is:
Free memory is
              total        used        free      shared  buff/cache   available
Mem:         257676        7361      246396          25        3918      249757
Swap:          1023           0        1023

Arena 0:
system bytes     = 1443450880
in use bytes     = 1443316976
Arena 1:
system bytes     =   35000320
in use bytes     =       6608
Total (incl. mmap):
system bytes     = 1478451200
in use bytes     = 1443323584
max mmap regions =          0
max mmap bytes   =          0
# Memory Allocation Stats
<malloc version="1">
<heap nr="0">
<sizes>
<size from="241" to="241" total="241" count="1"/>
<size from="529" to="529" total="529" count="1"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="2" size="770"/>
<system type="current" size="1443450880"/>
<system type="max" size="1443459072"/>
<aspace type="total" size="1443450880"/>
<aspace type="mprotect" size="1443450880"/>
</heap>
<heap nr="1">
<sizes>
<size from="33" to="48" total="48" count="1"/>
<size from="49" to="64" total="4026531712" count="62914558"/> <-- LOOK HERE!
<size from="65" to="80" total="160" count="2"/>
<size from="81" to="96" total="301989888" count="3145728"/> <-- LOOK HERE!
<size from="33" to="33" total="231" count="7"/>
<size from="49" to="49" total="1274" count="26"/>
<unsorted from="0" to="49377" total="1431600" count="6144"/>
</sizes>
<total type="fast" count="66060289" size="4328521808"/>
<total type="rest" count="6177" size="1433105"/>
<system type="current" size="4329967616"/>
<system type="max" size="4329967616"/>
<aspace type="total" size="35000320"/>
<aspace type="mprotect" size="35000320"/>
</heap>
<total type="fast" count="66060289" size="4328521808"/>
<total type="rest" count="6179" size="1433875"/>
<total type="mmap" count="0" size="0"/>
<system type="current" size="5773418496"/>
<system type="max" size="5773426688"/>
<aspace type="total" size="1478451200"/>
<aspace type="mprotect" size="1478451200"/>
</malloc>

#> Memory to after malloc_trim(0) is:
Free memory is
              total        used        free      shared  buff/cache   available
Mem:         257676        3269      250488          25        3918      253850
Swap:          1023           0        1023

Arena 0:
system bytes     = 1443319808
in use bytes     = 1443316976
Arena 1:
system bytes     =   35000320
in use bytes     =       6608
Total (incl. mmap):
system bytes     = 1478320128
in use bytes     = 1443323584
max mmap regions =          0
max mmap bytes   =          0
# Memory Allocation Stats
<malloc version="1">
<heap nr="0">
<sizes>
<size from="209" to="209" total="209" count="1"/>
<size from="529" to="529" total="529" count="1"/>
<unsorted from="0" to="49377" total="1431600" count="6144"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="6146" size="1432338"/>
<system type="current" size="1443459072"/>
<system type="max" size="1443459072"/>
<aspace type="total" size="1443459072"/>
<aspace type="mprotect" size="1443459072"/>
</heap>
<heap nr="1"> <---------------------------------------- LOOK HERE!
<sizes> <-- HERE!
<unsorted from="0" to="67108801" total="4296392384" count="6208"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="6208" size="4296392384"/>
<system type="current" size="4329967616"/>
<system type="max" size="4329967616"/>
<aspace type="total" size="35000320"/>
<aspace type="mprotect" size="35000320"/>
</heap>
<total type="fast" count="0" size="0"/>
<total type="rest" count="12354" size="4297824722"/>
<total type="mmap" count="0" size="0"/>
<system type="current" size="5773426688"/>
<system type="max" size="5773426688"/>
<aspace type="total" size="1478459392"/>
<aspace type="mprotect" size="1478459392"/>
</malloc>

#>

malloc_info的输出几乎没有文档,所以我不确定我指出的那些输出是否真的很快。为了验证它们确实是fastbins,我取消注释代码行

mallopt(M_MXFAST, 0);

在调用malloc_trim(0)之前,在加入线程之后禁用fastbins的使用和堆1的内存使用,看起来在调用malloc_trim(0)之后启用了fastbins。最重要的是,禁用fastbins的使用会在连接线程后立即将内存返回给系统。调用malloc_trim(0)后,在启用fastbins的情况下加入线程后,也会将内存返回给系统。

malloc_trim(0)的文档声明它只能从主竞技场堆的顶部释放内存,那么这里发生了什么?我使用glibc 2.17版在CentOS 7上运行。

1 个答案:

答案 0 :(得分:2)

  

malloc_trim(0)声明它只能从主竞技场堆的顶部释放内存,那么这里发生了什么?

它可以被称为过时&#34;或&#34;不正确&#34;文档。 Glibc没有documentation of malloc_trim function; Linux使用man-pages项目的手册页。 malloc_trim http://man7.org/linux/man-pages/man3/malloc_trim.3.html was written in 2012 by maintainer of man-pages的手册页是新的。可能他使用了来自glibc malloc / malloc.c源代码http://code.metager.de/source/xref/gnu/glibc/malloc/malloc.c#675

的一些评论
676  malloc_trim(size_t pad);
677
678  If possible, gives memory back to the system (via negative
679  arguments to sbrk) if there is unused memory at the `high' end of
680  the malloc pool. You can call this after freeing large blocks of
681  memory to potentially reduce the system-level memory requirements
682  of a program. However, it cannot guarantee to reduce memory. Under
683  some allocation patterns, some large free blocks of memory will be
684  locked between two used chunks, so they cannot be given back to
685  the system.
686
687  The `pad' argument to malloc_trim represents the amount of free
688  trailing space to leave untrimmed. If this argument is zero,
689  only the minimum amount of memory to maintain internal data
690  structures will be left (one page or less). Non-zero arguments
691  can be supplied to maintain enough trailing space to service
692  future expected allocations without having to re-obtain memory
693  from the system.
694
695  Malloc_trim returns 1 if it actually released any memory, else 0.
696  On systems that do not support "negative sbrks", it will always
697  return 0.

glibc中的实际实现是__malloc_trim,并且它具有迭代竞技场的代码:

http://code.metager.de/source/xref/gnu/glibc/malloc/malloc.c#4552

4552 int
4553 __malloc_trim (size_t s)

4560  mstate ar_ptr = &main_arena;
4561  do
4562    {
4563      (void) mutex_lock (&ar_ptr->mutex);
4564      result |= mtrim (ar_ptr, s);
4565      (void) mutex_unlock (&ar_ptr->mutex);
4566
4567      ar_ptr = ar_ptr->next;
4568    }
4569  while (ar_ptr != &main_arena);

使用mtrim()mTRIm())函数修剪每个竞技场,该函数调用malloc_consolidate()将所有免费分段从快速转换(它们在快速时无法合并)转换为正常免费块(与相邻块合并)

4498  /* Ensure initialization/consolidation */
4499  malloc_consolidate (av);

4111  malloc_consolidate is a specialized version of free() that tears
4112  down chunks held in fastbins. 

1581   Fastbins
1591    Chunks in fastbins keep their inuse bit set, so they cannot
1592    be consolidated with other free chunks. malloc_consolidate
1593    releases all chunks in fastbins and consolidates them with
1594    other free chunks.
  

问题是,当重新创建工作线程时,它会创建一个新的竞技场/堆而不是重用前一个竞技场/堆,这样以前的竞技场/堆的快速重复从不被重用。

这很奇怪。根据设计,glibc malloc中的最大竞技场数量受cpu_core_count * 8限制(对于64位平台); cpu_core_count * 2(对于32位平台)或环境变量MALLOC_ARENA_MAX / mallopt参数M_ARENA_MAX

您可以限制申请的竞技场数量;定期致电malloc_trim()或致电malloc()&#34; large&#34;在销毁之前,大小(它会调用malloc_consolidate)然后free()从你的线程中调用它:

3319 _int_malloc (mstate av, size_t bytes)
3368  if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
 // fastbin allocation path
3405  if (in_smallbin_range (nb))
 // smallbin path; malloc_consolidate may be called
3437     If this is a large request, consolidate fastbins before continuing.
3438     While it might look excessive to kill all fastbins before
3439     even seeing if there is space available, this avoids
3440     fragmentation problems normally associated with fastbins.
3441     Also, in practice, programs tend to have runs of either small or
3442     large requests, but less often mixtures, so consolidation is not
3443     invoked all that often in most programs. And the programs that
3444     it is called frequently in otherwise tend to fragment.
3445   */
3446
3447  else
3448    {
3449      idx = largebin_index (nb);
3450      if (have_fastchunks (av))
3451        malloc_consolidate (av);
3452    }

PS:malloc_trim https://github.com/mkerrisk/man-pages/commit/a15b0e60b297e29c825b7417582a33e6ca26bf65的手册页中有评论:

+.SH NOTES
+This function only releases memory in the main arena.
+.\" malloc/malloc.c::mTRIm():
+.\" return result | (av == &main_arena ? sYSTRIm (pad, av) : 0);

是的,检查了main_arena,但是它位于malloc_trim实施mTRIm()的最后,它仅用于调用带有负偏移量的sbrk()Since 2007 (glibc 2.9 and newer) there is another method将内存返回给OS:madvise(MADV_DONTNEED),它在所有竞技场中使用(并且没有glibc补丁的作者或手册页的作者记录)。每个竞技场都需要整合。还有用于修剪(munmapping)mmap-ed堆的顶部块(heap_trim / shrink_heap从慢速路径free()调用)的代码,但它不是从malloc_trim调用的。< / p>