Question

在Linux中，内核在我们实际使用该内存之前不会分配任何物理内存页面，但我在这里很难找到它为什么实际上分配这个内存：

   for(int t = 0; t < T; t++){
      for(int b = 0; b < B; b++){
         Matrix[t][b].length = 0;
         Matrix[t][b].size = 60;
         Matrix[t][b].pointers = (Node**)malloc(60*sizeof(Node*)); 
         }
   }

然后我访问这个数据结构，为它添加一个元素：

   Node* elem = NULL;
   Matrix[a][b].length++;
   Matrix[a][b]->pointers[ Matrix[a][b].length ] = elem;

基本上，我使用 htop 运行我的程序，如果增加no，Linux会分配更多内存。我在上面的代码中有“60”。为什么？它不应该只在第一个元素添加到数组时分配一个页面吗？

Answer 1

这取决于Linux系统的配置方式。

这是一个简单的C程序，试图分配1TB的内存并触及其中的一部分。

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

int main()
{
  char *array[1000];
  int i;

  for (i = 0; i < 1000; ++i)
  {
    if (NULL == (array[i] = malloc((int) 1e9)))
    {
      perror("malloc failed!");
      return -1;
    }

    array[i][0] = 'H';
  }

  for (i = 0; i < 1000; ++i)
    printf("%c", array[i][0]);

  printf("\n");

  sleep(10);

  return 0;
}

当我跑到顶边时，它表示VIRT内存使用率为931g（其中g表示GiB），而RES仅为4380 KiB。

现在，当我将系统更改为/sbin/sysctl -w vm.overcommit_memory=2使用不同的过度使用策略并重新运行时，我得到：

malloc failed!: Cannot allocate memory

因此，您的系统可能正在使用与您预期不同的过度使用策略。有关详细信息，请阅读this。

Answer 2

您认为malloc / new不会导致任何内存被写入，因此操作系统分配的物理内存不正确（对于您拥有的内存分配器实现）。< / p>

我已经在以下简单程序中复制了您描述的行为：

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char **argv)
{
  char **array[128][128];
  int    size;
  int    i, j;

  if (1 == argc || 0 >= (size = atoi(argv[1])))
    fprintf(stderr, "usage: %s <num>; where num > 0\n", argv[0]), exit(-1);

  for (i = 0; i < 128; ++i)
    for (j = 0; j < 128; ++j)
      if (NULL == (array[i][j] = malloc(size * sizeof(char*))))
      {
        fprintf(stderr, "malloc failed when i = %d, j = %d\n", i, j);
        perror(NULL);
        return -1;
      }

  sleep(10);

  return 0;
}

当我使用各种小size参数作为输入运行时，VIRT和RES内存占用（由top报告）一步一步增长，即使我没有明确触及内部数组我正在分配。

这基本上适用，直到size超过~512。此后，RES保持恒定在64 MiB，而VIRT可能非常大（例如，当size为10M时， - 1220 GiB）。这是因为512 * 8 = 4096，这是Linux系统上常见的虚拟页面大小，128 * 128 * 4096 B = 64 MiB。

因此，看起来每个分配的第一页都被映射到物理内存，可能是因为malloc / new本身正在为其自己的内部簿记写入部分分配。当然，许多小分配可能适合并放在同一页面上，因此只有一个页面被映射到物理内存以进行许多此类分配。

在您的代码示例中，更改数组的大小很重要，因为这意味着较少的数组可以放在一个页面上，因此需要更多内存页面由malloc / new本身触及（因此映射到物理内存OS）在程序的运行。

使用60时，大约需要480个字节，因此~8个分配可以放在一个页面上。当您使用100时，大约需要800个字节，因此这些分配中只有~5个可以放在一个页面上。所以，我期待＆＃34; 100计划＆＃34;使用大约8 / 5s的内存和＃34; 60程序＆＃34;，这似乎是一个足够大的差异，使您的机器开始交换到稳定的存储。

如果每个较小的＆＃34; 60＆＃34;分配的大小已超过1页，然后将其更改为更大的＆＃34; 100＆＃34;不会像您原先预期的那样影响程序的初始物理内存使用量。

PS - 我认为你是否明确触摸分配的初始页面是不相关的，因为malloc / new已经这样做了（对于你拥有的内存分配器实现）。

Answer 3

这里有一个草图，如果您通常期望您的b数组通常很小，通常小于2 ^ X指针（下面的代码中X = 5），您可以做什么，但也处理异常情况他们变得更大的地方。

如果您的预期使用量不匹配，您可以调低X值。您还可以从0调整最小大小数组（并且不分配较小的2 ^ i级别），如果您期望大多数数组通常使用至少2 ^ Y个指针（例如 - Y = 3）。

如果你认为你的使用模式实际上是X == Y（例如-4），那么你可以只做一个B *（0x1＆lt;＆lt; X）* sizeof（Node *）的分配并将其分成两部分T阵列到你的b。然后，如果一个b数组需要超过2 ^ X指针，那么如果它需要进一步增长，则使用malloc，然后再使用realloc。

这里的要点是初始分配将映射到非常少的物理内存，解决最初刺激原始问题的问题。

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

#define T           1278
#define B           131072
#define CAP_MAX_LG2 5        
#define CAP_MAX     (0x1 << CAP_MAX_LG2)  // pre-alloc T's to handle all B arrays of length up to 2^CAP_MAX_LG2

typedef struct Node Node;

typedef struct
{
  int    t;                               // so a matrix element can know to which T_Allocation it belongs
  int    length;
  int    cap_lg2;                         // log base 2 of capacity; -1 if capacity is zero
  Node **pointers;

} MatrixElem;

typedef struct
{
  Node **base;                            // pre-allocs B * 2^(CAP_MAX_LG2 + 1) Node pointers; every b array can be any of { 0, 1, 2, 4, 8, ..., CAP_MAX } capacity
  Node **frees_pow2[CAP_MAX_LG2 + 1];     // frees_pow2[i] will point at the next free array of 2^i pointers to Node to allocate to a growing b array

} T_Allocation;

MatrixElem   Matrix[T][B];
T_Allocation T_Allocs[T];

int  Node_init(Node *n) { return 0; } // just a dummy
void Node_fini(Node *n) { }           // just a dummy 
int  Node_eq(const Node *n1, const Node *n2)  { return 0; } // just a dummy

void Init(void)
{
  for(int t = 0; t < T; t++) 
  {
    T_Allocs[t].base = malloc(B * (0x1 << (CAP_MAX_LG2 + 1)) * sizeof(Node*));

    if (NULL == T_Allocs[t].base)
      abort();

    T_Allocs[t].free_pows2[0] = T_Allocs[t].base;

    for (int x = 1; x <= CAP_MAX_LG2; ++x)
      T_Allocs[t].frees_pow2[x] = &T_Allocs[t].base[B * (0x1 << (x - 1))];

    for(int b = 0; b < B; b++)
    {
      Matrix[t][b].t        = t;
      Matrix[t][b].length   = 0;
      Matrix[t][b].cap_lg2  = -1;
      Matrix[t][b].pointers = NULL;
    }
  }
}

Node *addElement(MatrixElem *elem)
{
  if (-1 == elem->cap_lg2 || elem->length == (0x1 << elem->cap_lg2))  // elem needs a bigger pointers array to add an element
  {
    int new_cap_lg2 = elem->cap_lg2 + 1;
    int new_cap     = (0x1 << new_cap_lg2);

    if (new_cap_lg2 <= CAP_MAX_LG2)            // new b array can still fit in pre-allocated space in T
    {
      Node **new_pointers = T_Allocs[elem->t].frees_pow2[new_cap_lg2];

      memcpy(new_pointers, elem->pointers, elem->length * sizeof(Node*));
      elem->pointers = new_pointers;

      T_Allocs[elem->t].frees_pow2[new_cap_lg2] += new_cap;
    }
    else if (elem->cap_lg2 == CAP_MAX_LG2)     // exceeding pre-alloc'ed arrays in T; use malloc
    {
      Node **new_pointers = malloc(new_cap * sizeof(Node*));

      if (NULL == new_pointers)
        return NULL;

      memcpy(new_pointers, elem->pointers, elem->length * sizeof(Node*));
      elem->pointers = new_pointers;
    } 
    else                                       // already exceeded pre-alloc'ed arrays in T; use realloc
    {
      Node **new_pointers = realloc(elem->pointers, new_cap * sizeof(Node*));

      if (NULL == new_pointers)
        return NULL;

      elem->pointers = new_pointers;
    }

    ++elem->cap_lg2;
  }

  Node *ret = malloc(sizeof(Node);

  if (ret)
  {
    Node_init(ret);
    elem->pointers[elem->length] = ret;
    ++elem->length;
  }

  return ret;
}

int removeElement(const Node *a, MatrixElem *elem)
{
  int i;

  for (i = 0; i < elem->length && !Node_eq(a, elem->pointers[i]); ++i);

  if (i == elem->length)
    return -1;

  Node_fini(elem->pointers[i]);
  free(elem->pointers[i]);
  --elem->length;
  memmove(&elem->pointers[i], &elem->pointers[i+1], sizeof(Node*) * (elem->length - i));

  return 0;
}

int main()
{
  return 0;
}

Linux确实在C ++代码中分配了内存

3 个答案: