Question

我正在尝试在由HyperV管理的虚拟机上运行Slurm计算节点。该节点运行Ubuntu 16.04。

$PackageID = $Package | ForEach-Obejct { $_.SelectNodes('def:S[contains(@N,"PackageID")]',$ns) }显示：

slurmd -C

这不是绝对正确的，该计算机可用的最大RAM量为96Gb，但是RAM是由HyperV根据要求分配的。如果没有负载，则该节点只有16 Gb。

我尝试运行一些处理大型数据集的python脚本，而不会感到困惑，并且看到最大RAM增加到96Gb。

我的NodeName=calc1 CPUs=48 Boards=1 SocketsPerBoard=1 CoresPerSocket=48 ThreadsPerCore=1 RealMemory=16013 UpTime=5-20:51:31（包括其他行）中包含以下内容：

slurmd.conf

但是，SchedulerType=sched/backfill SelectType=select/cons_res SelectTypeParameters=CR_CPU_Memory FastSchedule=1 DefMemPerCPU=2048 NodeName=calc1 CPUs=48 Boards=1 SocketsPerBoard=1 CoresPerSocket=48 ThreadsPerCore=1 RealMemory=96000 CoreSpecCount=8 MemSpecLimit=6000显示仅加载了8个核心，有40个处于空闲状态。而Mem只有16Gb。

有时，由于“内存不足”，节点进入htop状态。看起来slurmd不相信我在Drained

中写的内容

如何使Slurmd请求其他GB的RAM？

更新

我仍然没有应用@Carles Fenoy提出的配置更改，但是观察到一个奇怪的细节。

slurm.conf的输出：

scontrol show node

然后我将SSH转换为calc1并发出NodeName=calc1 Arch=x86_64 CoresPerSocket=48 CPUAlloc=40 CPUErr=0 CPUTot=48 CPULoad=10.25 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=calc1 NodeHostName=calc1 Version=17.11 OS=Linux 4.4.0-145-generic #171-Ubuntu SMP Tue Mar 26 12:43:40 UTC 2019 RealMemory=96000 AllocMem=81920 FreeMem=179 Sockets=1 Boards=1 CoreSpecCount=8 CPUSpecList=40-47 MemSpecLimit=6000 State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=main BootTime=2019-04-12T12:50:39 SlurmdStartTime=2019-04-18T09:24:29 CfgTRES=cpu=48,mem=96000M,billing=48 AllocTRES=cpu=40,mem=80G CapWatts=n/a CurrentWatts=0 LowestJoules=0 ConsumedJoules=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s。这是它的输出：

free -h

更新2 我已经与我们的基础架构专家讨论了此问题，并发现该机制称为Hyper-V Dynamic Memory

将尝试查找Microsoft是否向虚拟机提供任何API。也许我会很幸运，有人为它开发了Slurm插件。

Answer 1

将FastSchedule参数更改为0或2。

这是slurm.conf文档的摘录：

   FastSchedule
          Controls  how a node's configuration specifications in slurm.conf are used.  If the number of node configuration entries in the configuration file is significantly lower than the number of nodes, setting FastSchedule
          to 1 will permit much faster scheduling decisions to be made.  (The scheduler can just check the values in a few configuration records instead of possibly thousands of  node  records.)   Note  that  on  systems  with
          hyper-threading, the processor count reported by the node will be twice the actual processor count.  Consider which value you want to be used for scheduling purposes.

          0    Base  scheduling  decisions  upon the actual configuration of each individual node except that the node's processor count in Slurm's configuration must match the actual hardware configuration if PreemptMode=sus-
               pend,gang or SelectType=select/cons_res are configured (both of those plugins maintain resource allocation information using bitmaps for the cores in the system and must remain static, while  the  node's  memory
               and disk space can be established later).

          1 (default)
               Consider the configuration of each node to be that specified in the slurm.conf configuration file and any node with less than the configured resources will be set to DRAIN.

          2    Consider the configuration of each node to be that specified in the slurm.conf configuration file and any node with less than the configured resources will not be set DRAIN.  This option is generally only useful
               for testing purposes.

在Hyper-V虚拟机上构造计算节点，向Hyper-V请求RAM

1 个答案: