了解/ proc / sys / vm / lowmem_reserve_ratio

时间:2011-02-13 12:45:16

标签: linux memory-management virtual-machine procfs

通过阅读Documentation / sysctl / vm.txt中的说明,我无法理解变量“lowmem_reserve_ratio”的含义。 我也尝试过谷歌,但发现的所有解释与vm.txt中的解释完全相同。

如果sb解释它或提及它的一些链接将是非常有帮助的。 以下是原始解释: -

The lowmem_reserve_ratio is an array. You can see them by reading this file.
-
% cat /proc/sys/vm/lowmem_reserve_ratio
256     256     32
-
Note: # of this elements is one fewer than number of zones. Because the highest
      zone's value is not necessary for following calculation.

But, these values are not used directly. The kernel calculates # of protection
pages for each zones from them. These are shown as array of protection pages
in /proc/zoneinfo like followings. (This is an example of x86-64 box).
Each zone has an array of protection pages like this.

-
Node 0, zone      DMA
  pages free     1355
        min      3
        low      3
        high     4
        :
        :
    numa_other   0
        protection: (0, 2004, 2004, 2004)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  pagesets
    cpu: 0 pcp: 0
        :
-
These protections are added to score to judge whether this zone should be used
for page allocation or should be reclaimed.

In this example, if normal pages (index=2) are required to this DMA zone and
watermark[WMARK_HIGH] is used for watermark, the kernel judges this zone should
not be used because pages_free(1355) is smaller than watermark + protection[2]
(4 + 2004 = 2008). If this protection value is 0, this zone would be used for
normal page requirement. If requirement is DMA zone(index=0), protection[0]
(=0) is used.
zone[i]'s protection[j] is calculated by following expression.

(i < j):
  zone[i]->protection[j]
  = (total sums of present_pages from zone[i+1] to zone[j] on the node)
    / lowmem_reserve_ratio[i];
(i = j):
   (should not be protected. = 0;
(i > j):
   (not necessary, but looks 0)

The default values of lowmem_reserve_ratio[i] are
    256 (if zone[i] means DMA or DMA32 zone)
    32  (others).
As above expression, they are reciprocal number of ratio.
256 means 1/256. # of protection pages becomes about "0.39%" of total present
pages of higher zones on the node.

If you would like to protect more pages, smaller values are effective.
The minimum value is 1 (1/1 -> 100%).

3 个答案:

答案 0 :(得分:3)

遇到和你一样的问题,我用google搜索了很多内容,发现了this page,它可能(或可能不是)比内核文档更容易理解。

(我这里不引用因为它不可读)

答案 1 :(得分:1)

我发现该文件中的措辞确实令人困惑。查看mm/page_alloc.c中的来源有助于澄清它,所以让我尝试一下更简单的解释:

正如您所引用的页面所述,这些数字&#34;是比率的倒数&#34;。不同的是:这些数字是除数。因此,在计算节点中给定区域的保留页面时,您将该节点中的页面总和高于该节点中的页面,将其除以提供的除数,以及您需要多少页面。重新保留该区域。

示例:假设1 GiB节点,区域正常为768 MiB,区域HighMem为256 MiB(假设没有区域DMA)。让我们假设默认的highmem储备&#34;比率&#34; (除数)为32.并且让我们假设典型的4 KiB页面大小。现在我们可以计算区域Normal的保留区域:

  1. &#34;更高&#34;区域比区域正常(只是HighMem):256 MiB =(1024 KiB / 1 MiB)*(1页/ 4 KiB)= 65536页
  2. 此节点的区域正常保留区域:65536页/ 32 = 2048页= 8 MiB。
  3. 添加更多区域和节点时,概念保持不变。请记住,保留的大小是在页面中 - 你永远不会保留页面的一小部分。

答案 2 :(得分:0)

我找到了解释得非常清楚的内核源代码。

    /*
 * setup_per_zone_lowmem_reserve - called whenever
 *  sysctl_lowmem_reserve_ratio changes.  Ensures that each zone
 *  has a correct pages reserved value, so an adequate number of
 *  pages are left in the zone after a successful __alloc_pages().
 */
static void setup_per_zone_lowmem_reserve(void)
{
    struct pglist_data *pgdat;
    enum zone_type j, idx;

for_each_online_pgdat(pgdat) {
    for (j = 0; j < MAX_NR_ZONES; j++) {
        struct zone *zone = pgdat->node_zones + j;
        unsigned long managed_pages = zone->managed_pages;

        zone->lowmem_reserve[j] = 0;

        idx = j;
        while (idx) {
            struct zone *lower_zone;

            idx--;

            if (sysctl_lowmem_reserve_ratio[idx] < 1)
                sysctl_lowmem_reserve_ratio[idx] = 1;

            lower_zone = pgdat->node_zones + idx;
            lower_zone->lowmem_reserve[j] = managed_pages /
                sysctl_lowmem_reserve_ratio[idx];
            managed_pages += lower_zone->managed_pages;
        }
    }
}

/* update totalreserve_pages */
calculate_totalreserve_pages();
}

这里甚至列出了一个演示。

    /*
 * results with 256, 32 in the lowmem_reserve sysctl:
 *  1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
 *  1G machine -> (16M dma, 784M normal, 224M high)
 *  NORMAL allocation will leave 784M/256 of ram reserved in the ZONE_DMA
 *  HIGHMEM allocation will leave 224M/32 of ram reserved in ZONE_NORMAL
 *  HIGHMEM allocation will leave (224M+784M)/256 of ram reserved in ZONE_DMA
 *
 * TBD: should special case ZONE_DMA32 machines here - in those we normally
 * don't need any ZONE_NORMAL reservation
 */
int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
#ifdef CONFIG_ZONE_DMA
     256,
#endif
#ifdef CONFIG_ZONE_DMA32
     256,
#endif
#ifdef CONFIG_HIGHMEM
     32,
#endif
     32,
};

总而言之,表情看起来像

zone[1]->lowmem_reserve[2] =  zone[2]->managed_pages / sysctl_lowmem_reserve_ratio[1] 
zone[0]->lowmem_reserve[2] =  (zone[1] + zone[2])->managed_pages / sysctl_lowmem_reserve_ratio[0]