Python subprocess.Popen“OSError:[Errno 12]无法分配内存”

时间:2009-09-02 12:23:43

标签: python linux memory

注意:此问题最初被问到here,但即使找不到可接受的答案,赏金时间也已过期。我正在重新询问这个问题,包括原始问题中提供的所有细节。

python脚本使用sched模块每60秒运行一组类函数:

# sc is a sched.scheduler instance
sc.enter(60, 1, self.doChecks, (sc, False))

脚本使用代码here作为守护进程运行。

作为doChecks的一部分调用的许多类方法使用subprocess模块来调用系统函数以获取系统统计信息:

ps = subprocess.Popen(['ps', 'aux'], stdout=subprocess.PIPE).communicate()[0]

在整个脚本崩溃之前一段时间运行正常,出现以下错误:

File "/home/admin/sd-agent/checks.py", line 436, in getProcesses
File "/usr/lib/python2.4/subprocess.py", line 533, in __init__
File "/usr/lib/python2.4/subprocess.py", line 835, in _get_handles
OSError: [Errno 12] Cannot allocate memory

脚本崩溃后,服务器上的free -m输出为:

$ free -m
                  total       used       free     shared     buffers    cached
Mem:                894        345        549          0          0          0
-/+ buffers/cache:  345        549
Swap:                 0          0          0

服务器正在运行CentOS 5.3。我无法在我自己的CentOS盒子上复制,也无法在报告相同问题的任何其他用户上复制。

我已按照原始问题中的建议尝试了许多调试方法:

  1. 在Popen调用之前和之后记录free -m的输出。内存使用情况没有重大变化,因为脚本运行时内存不会逐渐耗尽。

  2. 我在Popen调用中添加了close_fds = True,但没有区别 - 脚本仍然因同样的错误而崩溃。建议herehere

  3. 我按照建议的here检查了RLIMIT_DATA和RLIMIT_AS上显示(-1,-1)的rlimits。

  4. An article建议没有交换空间可能是原因,但交换实际上是按需提供的(根据网络主机),这也被认为是一个虚假的原因here

  5. 正在关闭进程,因为这是使用.communicate()作为Python源代码和评论here备份的行为。

  6. 整个检查可以在GitHub here上找到,其中getProcesses函数是从442行定义的。这是由doChecks()从第520行开始调用的。

    在崩溃之前,脚本使用strace运行以下输出:

    recv(4, "Total Accesses: 516662\nTotal kBy"..., 234, 0) = 234
    gettimeofday({1250893252, 887805}, NULL) = 0
    write(3, "2009-08-21 17:20:52,887 - checks"..., 91) = 91
    gettimeofday({1250893252, 888362}, NULL) = 0
    write(3, "2009-08-21 17:20:52,888 - checks"..., 74) = 74
    gettimeofday({1250893252, 888897}, NULL) = 0
    write(3, "2009-08-21 17:20:52,888 - checks"..., 67) = 67
    gettimeofday({1250893252, 889184}, NULL) = 0
    write(3, "2009-08-21 17:20:52,889 - checks"..., 81) = 81
    close(4)                                = 0
    gettimeofday({1250893252, 889591}, NULL) = 0
    write(3, "2009-08-21 17:20:52,889 - checks"..., 63) = 63
    pipe([4, 5])                            = 0
    pipe([6, 7])                            = 0
    fcntl64(7, F_GETFD)                     = 0
    fcntl64(7, F_SETFD, FD_CLOEXEC)         = 0
    clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f12708) = -1 ENOMEM (Cannot allocate memory)
    write(2, "Traceback (most recent call last"..., 35) = 35
    open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    write(2, "  File \"/usr/bin/sd-agent/agent."..., 52) = 52
    open("/home/admin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/bin/sd-agent/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/lib/python24.zip/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/plat-linux2/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/lib/python2.4/lib-tk/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/lib-dynload/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/site-packages/daemon.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    write(2, "  File \"/home/admin/sd-agent/dae"..., 60) = 60
    open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/bin/sd-agent/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/lib/python24.zip/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/plat-linux2/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/lib/python2.4/lib-tk/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/lib-dynload/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/site-packages/agent.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    write(2, "  File \"/usr/bin/sd-agent/agent."..., 54) = 54
    open("/usr/lib/python2.4/sched.py", O_RDONLY|O_LARGEFILE) = 8
    write(2, "  File \"/usr/lib/python2.4/sched"..., 55) = 55
    fstat64(8, {st_mode=S_IFREG|0644, st_size=4054, ...}) = 0
    mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
    read(8, "\"\"\"A generally useful event sche"..., 4096) = 4054
    write(2, "    ", 4)                     = 4
    write(2, "void = action(*argument)\n", 25) = 25
    close(8)                                = 0
    munmap(0xb7d28000, 4096)                = 0
    open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    write(2, "  File \"/usr/bin/sd-agent/checks"..., 60) = 60
    open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/bin/sd-agent/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/lib/python24.zip/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/plat-linux2/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOMEM (Cannot allocate memory)
    open("/usr/lib/python2.4/lib-tk/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/lib-dynload/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    open("/usr/lib/python2.4/site-packages/checks.py", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
    write(2, "  File \"/usr/bin/sd-agent/checks"..., 64) = 64
    open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
    write(2, "  File \"/usr/lib/python2.4/subpr"..., 65) = 65
    fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
    mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
    read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
    read(8, "lso, the newlines attribute of t"..., 4096) = 4096
    read(8, "code < 0:\n        print >>sys.st"..., 4096) = 4096
    read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096
    read(8, " p2cread\n        # c2pread    <-"..., 4096) = 4096
    write(2, "    ", 4)                     = 4
    write(2, "errread, errwrite)\n", 19)    = 19
    close(8)                                = 0
    munmap(0xb7d28000, 4096)                = 0
    open("/usr/lib/python2.4/subprocess.py", O_RDONLY|O_LARGEFILE) = 8
    write(2, "  File \"/usr/lib/python2.4/subpr"..., 71) = 71
    fstat64(8, {st_mode=S_IFREG|0644, st_size=39931, ...}) = 0
    mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7d28000
    read(8, "# subprocess - Subprocesses with"..., 4096) = 4096
    read(8, "lso, the newlines attribute of t"..., 4096) = 4096
    read(8, "code < 0:\n        print >>sys.st"..., 4096) = 4096
    read(8, "alse does not exist on 2.2.0\ntry"..., 4096) = 4096
    read(8, " p2cread\n        # c2pread    <-"..., 4096) = 4096
    read(8, "table(self, handle):\n           "..., 4096) = 4096
    read(8, "rrno using _sys_errlist (or siml"..., 4096) = 4096
    read(8, " p2cwrite = None, None\n         "..., 4096) = 4096
    write(2, "    ", 4)                     = 4
    write(2, "self.pid = os.fork()\n", 21)  = 21
    close(8)                                = 0
    munmap(0xb7d28000, 4096)                = 0
    write(2, "OSError", 7)                  = 7
    write(2, ": ", 2)                       = 2
    write(2, "[Errno 12] Cannot allocate memor"..., 33) = 33
    write(2, "\n", 1)                       = 1
    unlink("/var/run/sd-agent.pid")         = 0
    close(3)                                = 0
    munmap(0xb7e0d000, 4096)                = 0
    rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x589978}, {0xb89a60, [], SA_RESTORER, 0x589978}, 8) = 0
    brk(0xa022000)                          = 0xa022000
    exit_group(1)                           = ?
    

7 个答案:

答案 0 :(得分:76)

作为一般规则(即在香草内核中),fork / clone失败ENOMEM occur specifically因为对上帝诚实-memory condition dup_mmdup_task_structalloc_pidmpol_dupmm_init等等。或者因为security_vm_enough_memory_mm失败你<{>} enforcing overcommit policy

首先检查在fork尝试时未能fork的进程的vmsize,然后将其与过度使用策略相关的可用内存量(物理和交换)进行比较(插入数字) 。)

在您的特定情况下,请注意Virtuozzo在additional checks中有overcommit enforcement。此外,我不确定您真正拥有多少控制权,从 您的容器中,超过swap and overcommit configuration(以便影响执行结果。)

现在,为了实际向前推进,我会说你留下了两个选项

  • 切换到更大的实例,或
  • 将一些编码工作放入更有效地控制脚本的内存足迹

注意如果事实证明它不是你,那么编码工作可能一无所获,但是其他人在与运行amock的同一服务器上的不同实例中并置。

记忆方面,我们已经知道 subprocess.Popen使用fork / clone under the hood,这意味着每次你打电话给你'重新再次请求与Python已经耗尽的内存,即数百个额外的MB,以便exec一个微不足道的10kB可执行文件,例如freeps。如果出现不利的过度使用政策,您很快就会看到ENOMEM

fork的替代品没有此父网页表等副本问题,vforkposix_spawn。但是,如果您不想在subprocess.Popen / vfork方面重写posix_spawn的块,请考虑在脚本开头只使用suprocess.Popen一次(当Python的内存时)足迹是最小的),以生成一个shell脚本,然后运行free / ps / sleep以及与您的脚本平行的循环中的任何其他内容;轮询脚本的输出或同步读取它,可能来自一个单独的线程,如果你有其他东西需要异步处理 - 在Python中进行数据处理,但将分支留给下级进程。

HOWEVER ,在您的特定情况下,您可以完全跳过调用psfree;您可以直接从procfs 以Python方式随时获取信息,无论您是选择自己访问还是通过existing libraries and/or packages访问它。如果psfree是您运行的唯一实用程序,那么您可以完全取消subprocess.Popen

最后,无论你对subprocess.Popen所做的事情如何,如果你的脚本泄漏内存,你最终仍然会遇到困难。请密切关注它,check for memory leaks

答案 1 :(得分:16)

查看free -m的输出,在我看来,你实际上没有可用的交换内存。我不确定在Linux中是否总是可以按需自动提供交换,但我遇到了同样的问题,这里没有任何答案对我有帮助。然而,添加一些交换内存,解决了我的问题所以,因为这可能有助于其他人面临同样的问题,我发布我的答案如何添加1GB交换(在Ubuntu 12.04上,但它应该适用于其他发行版。)< / p>

您可以先检查是否启用了交换内存。

$sudo swapon -s

如果它为空,则表示您没有启用任何交换。添加1GB交换:

$sudo dd if=/dev/zero of=/swapfile bs=1024 count=1024k
$sudo mkswap /swapfile
$sudo swapon /swapfile

将以下行添加到fstab以使交换永久化。

$sudo vim /etc/fstab

     /swapfile       none    swap    sw      0       0 

可以找到来源和更多信息here

答案 2 :(得分:8)

交换可能不是之前建议的红鲱鱼。在ENOMEM之前有问题的python进程有多大?

在内核2.6下,/proc/sys/vm/swappiness控制内核转换为交换的积极程度,overcommit*文件控制内核可以通过眨眼和点头分配内存的程度和精确度。与您的Facebook关系状态一样,it's complicated

  

...但是交换实际上是按需提供的(根据网络主机而言)......

但不是根据free(1)命令的输出,该命令显示服务器实例未识别交换空间。现在,您的Web主机当然可能比我更了解这个主题,但我使用的虚拟RHEL / CentOS系统已经报告了可用于客户操作系统的交换。

改编Red Hat KB Article 15252

  

红帽企业Linux 5系统   没有交换空间会运行得很好   在所有只要匿名的总和   内存和系统V共享内存是   小于RAM量的3/4。   ....系统有4GB或更少的RAM    [建议有] 至少   2GB的交换空间。

将您的/proc/sys/vm设置与普通的CentOS 5.3安装进行比较。添加交换文件。缩小swappiness,看看你是否再活下去了。

答案 3 :(得分:5)

我仍然怀疑您的客户/用户已经加载了一些内核模块或驱动程序 正在干扰clone()系统调用(可能是一些模糊的安全增强, 像LIDS这样的东西,但更晦涩?)或以某种方式填补一些内核数据 fork() / clone()运行所必需的结构(流程表,页面 表,文件描述符表等。)

以下是fork(2)手册页的相关部分:

ERRORS
       EAGAIN fork() cannot allocate sufficient memory to copy the parent's page tables and allocate a task  structure  for  the
              child.

       EAGAIN It  was not possible to create a new process because the caller's RLIMIT_NPROC resource limit was encountered.  To
              exceed this limit, the process must have either the CAP_SYS_ADMIN or the CAP_SYS_RESOURCE capability.

       ENOMEM fork() failed to allocate the necessary kernel structures because memory is tight.

我建议让用户在启动到一个库存,通用内核之后尝试这个,并且只加载一组最小的模块和驱动程序(运行应用程序/脚本所必需的最少)。从那里开始,假设它在该配置中工作,他们可以在该配置和显示该问题的配置之间执行二进制搜索。这是标准的系统管理员疑难解答101。

strace中的相关行是:

clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7f12708) = -1 ENOMEM (Cannot allocate memory)

...我知道其他人已经讨论过交换和内存可用性(我建议你至少设置一个小的交换分区,具有讽刺意味的是,即使它在RAM磁盘上......通过Linux内核的代码路径当它甚至有一小部分可用交换时,已经比那些交换零交换的那些(异常处理路径)运行得更广泛。

但是我怀疑这仍然是一个红鲱鱼。

free报告缓存和缓冲区正在使用的0(ZERO)内存这一事实非常令人不安。我怀疑free输出......以及可能是你的应用程序问题,是由某些专有内核模块引起的,它会以某种方式干扰内存分配。

根据fork()/ clone()的手册页,如果你的调用会导致资源限制违规(RLIMIT_NPROC),fork()系统调用应该返回EAGAIN ...但是,它没有说EAGAIN是否是被其他RLIMIT *违规归还。在任何情况下,如果您的目标/主机有某种奇怪的Vormetric或其他安全设置(或者即使您的进程在一些奇怪的SELinux策略下运行),那么它可能会导致此-ENOMEM失败。

这不太可能是普通的普通Linux / UNIX问题。你有一些非标准的东西。

答案 4 :(得分:2)

您是否尝试过使用:

(status,output) = commands.getstatusoutput("ps aux")

我认为这对我来说已经解决了同样的问题。 但后来我的过程最终被杀死而不是没有产生,这更糟糕..

经过一些测试后,我发现这只发生在旧版本的python上:它发生在2.6.5而不是2.7.2

我的搜索引导我python-close_fds-issue,但未设置的closed_fds并没有解决问题。它仍然值得一读。

我发现python只是密切关注文件描述符:

watch "ls /proc/$PYTHONPID/fd | wc -l"

和你一样,我确实希望捕获命令的输出,我确实想避免OOM错误......但看起来人们使用较少错误的Python版本是唯一的方法。不理想......

答案 5 :(得分:0)

  

munmap(0xb7d28000,4096)= 0
  write(2,“OSError”,7)= 7

我看过看起来像这样的草率代码:

serrno = errno;
some_Syscall(...)
if (serrno != errno)
/* sound alarm: CATROSTOPHIC ERROR !!! */

你应该检查一下这是不是发生了什么 python代码。 Errno仅在进行系统调用时有效 失败。

编辑添加:

你没有说这个过程有多长。可能的记忆消费者

  • 分叉流程
  • 未使用的数据结构
  • 共享库
  • 内存映射文件

答案 6 :(得分:0)

也许您可以简单地

$ sudo bash -c "echo vm.overcommit_memory=1 >> /etc/sysctl.conf"
$ sudo sysctl -p

它适用于我的情况。

参考:https://github.com/openai/gym/issues/110#issuecomment-220672405