聚合数十亿条记录时如何修复太多打开文件错误

时间:2015-05-04 06:07:25

标签: linux mongodb ubuntu

我收到以下错误

opening file "/workspace/mongo/data/_tmp/extsort.63355": errno:24 Too many open files

我该如何解决此错误?

因为已打开的文件是63355?

2015-05-02T08:01:40.490+0000 I COMMAND  [conn1] command sandbox.$cmd command: listCollections { listCollections: 1.0 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:411 locks:{} 169ms
2015-05-02T15:01:02.060+0000 I -        [conn2] Assertion: 16818:error opening file "/workspace/mongo/data/_tmp/extsort.63355": errno:24 Too many open files
2015-05-02T15:01:02.235+0000 I CONTROL  [conn2] 
 0xf4d299 0xeeda71 0xed2d3f 0xed2dec 0xb3f453 0xb3c88c 0xb3d2dd 0xb3dfe2 0xb499c5 0xb49136 0xb7e3e6 0x987165 0x9d8b04 0x9d9aed 0x9da7fb 0xb9e956 0xab4d20 0x80e75d 0xf00e6b 0x7fe38e8b4182 0x7fe38d37c47d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B4D299"},{"b":"400000","o":"AEDA71"},{"b":"400000","o":"AD2D3F"},{"b":"400000","o":"AD2DEC"},{"b":"400000","o":"73F453"},{"b":"400000","o":"73C88C"},{"b":"400000","o":"73D2DD"},{"b":"400000","o":"73DFE2"},{"b":"400000","o":"7499C5"},{"b":"400000","o":"749136"},{"b":"400000","o":"77E3E6"},{"b":"400000","o":"587165"},{"b":"400000","o":"5D8B04"},{"b":"400000","o":"5D9AED"},{"b":"400000","o":"5DA7FB"},{"b":"400000","o":"79E956"},{"b":"400000","o":"6B4D20"},{"b":"400000","o":"40E75D"},{"b":"400000","o":"B00E6B"},{"b":"7FE38E8AC000","o":"8182"},{"b":"7FE38D282000","o":"FA47D"}],"processInfo":{ "mongodbVersion" : "3.0.1", "gitVersion" : "534b5a3f9d10f00cd27737fbcd951032248b5952", "uname" : { "sysname" : "Linux", "release" : "3.13.0-44-generic", "version" : "#73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "C35E766AD226FC0C16CB0C3885EC3B59E288A3F2" }, { "b" : "7FFF448FE000", "elfType" : 3, "buildId" : "9D77366C6409A9EA266179080FA7C779EEA8A958" }, { "b" : "7FE38E8AC000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7FE38E64E000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "FF43D0947510134A8A494063A3C1CF3CEBB27791" }, { "b" : "7FE38E273000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "B927879B878D90DD9FF4B15B00E7799AA8E0272F" }, { "b" : "7FE38E06B000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7FE38DE67000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7FE38DB63000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "19EFDDAB11B3BF5C71570078C59F91CF6592CE9E" }, { "b" : "7FE38D85D000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7FE38D647000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "8D0AA71411580EE6C08809695C3984769F25725B" }, { "b" : "7FE38D282000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7FE38EACA000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf4d299]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xeeda71]
 mongod(_ZN5mongo11msgassertedEiPKc+0xAF) [0xed2d3f]
 mongod(+0xAD2DEC) [0xed2dec]
 mongod(_ZN5mongo16SortedFileWriterINS_5ValueES1_EC1ERKNS_11SortOptionsERKSt4pairINS1_25SorterDeserializeSettingsES7_E+0x5D3) [0xb3f453]
 mongod(_ZN5mongo19DocumentSourceGroup5spillEv+0x1BC) [0xb3c88c]
 mongod(_ZN5mongo19DocumentSourceGroup8populateEv+0x46D) [0xb3d2dd]
 mongod(_ZN5mongo19DocumentSourceGroup7getNextEv+0x292) [0xb3dfe2]
 mongod(_ZN5mongo21DocumentSourceProject7getNextEv+0x45) [0xb499c5]
 mongod(_ZN5mongo17DocumentSourceOut7getNextEv+0xD6) [0xb49136]
 mongod(_ZN5mongo8Pipeline3runERNS_14BSONObjBuilderE+0xA6) [0xb7e3e6]
 mongod(_ZN5mongo15PipelineCommand3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x7A5) [0x987165]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9d8b04]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC7D) [0x9d9aed]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9da7fb]
 mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_+0x746) [0xb9e956]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xB10) [0xab4d20]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xDD) [0x80e75d]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x34B) [0xf00e6b]
 libpthread.so.0(+0x8182) [0x7fe38e8b4182]
 libc.so.6(clone+0x6D) [0x7fe38d37c47d]
-----  END BACKTRACE  -----
2015-05-02T15:02:07.753+0000 I COMMAND  [conn2] CMD: drop sandbox.tmp.agg_out.1

更新

我在控制台上输入了ulimit -n unlimited

并使用以下设置修改/etc/security/limits.conf

* soft nofile unlimited
* hard nofile unlimited
* soft nproc unlimited
* hard nproc unlimited

通过 ulimit -a

进行检查
health# ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         0
-m: resident set size (kbytes)      unlimited
-u: processes                       unlimited
-n: file descriptors                4096
-l: locked-in-memory size (kbytes)  64
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 31538
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15:                              unlimited
health# ulimit -Sn
4096
health# ulimit -Hn
4096

我的系统设置是否可以在打开的文件上无限制地设置?

2 个答案:

答案 0 :(得分:4)

对于这个没有干净的答案,因为你正在做一些非常重的事情,但可以使用解决方法

ulimit是unix / linux中的命令,它允许为所有属性设置系统限制。

在你的情况下你需要增加最大值。没有。打开文件计数或使其在更安全的一面无限制(MongoDB也推荐)

ulimit -n <large value in your case 1000000>

or 

sysctl -w fs.file-max=1000000

and

/etc/security/limits.conf or /etc/sysctl.conf:
change 

fs.file-max = 1000000

答案 1 :(得分:3)

我发现有必要更改系统范围的设置(使用ulimit建议的Nachiket Kate;可以找到另一个很好的Ubuntu描述here)以及mongodb设置(如记录here)。

为了便于解释,我将总结我为处理事物而执行的命令(我将再次引用它们在讨论中所属的链接)。

确定内核强制执行的文件描述符的最大数量是否足够(数量是否足够)?

$ cat /proc/sys/fs/file-max
6569231

在我的情况下,这不是问题。检查mongodb用户的ulimit设置显示文件描述符的数量是微不足道的1024:

$ sudo -H -u mongodb bash -c 'ulimit -a'
...
open files                      (-n) 1024
...

可以通过增加软(用户可以修改它们)和硬限制(我设置得很高)来为所有用户更改这些值:

$ sudo su
$ echo -e "* hard\tnofile\t1000000\n* soft\tnofile\t990000" >> /etc/security/limits.conf

这也可以通过用用户名替换*来完成。虽然这是基于每个用户的工作,但重新启动mongo守护程序导致文件描述符的数量返回到1024.有必要遵循关于pam会话的建议here

$ for file in /etc/pam.d/common-session*; do 
      echo 'session required pam_limits.so' >> $file
  done

为了测试设置是否已应用,我创建了一个wee python脚本(放在/tmp/file_descriptor_test.py中):

#!/usr/bin/env python
n=990000

fd_list=list()
for i in range(1,n):
    fd_list.append(open('/tmp/__%08d' % (i), 'w'))

print 'opened %d fds' % n

作为mongodb用户运行此操作显示所有系统都很好:

sudo -H -u mongodb bash -c '/tmp/file_descriptor_test.py'
Traceback (most recent call last):
File "/tmp/fd.py", line 8, in <module>
IOError: [Errno 24] Too many open files: '/tmp/__00989998'

可以使用

删除/tmp/中的文件
sudo find -type f -name '__*' -delete 

因为您无法正确列出它们(因此rm无法正常工作)。

但是,在运行有问题的mongo进程时,我仍然遇到了相同的Too many open files错误。这让我相信这个问题也存在于mongo中(并最终导致我尴尬地对待优秀的documentation。编辑etc/systemd/system/multi-user.target.wants/mongodb-01.service并在[Service]指令下添加以下行

# (file size)
LimitFSIZE=infinity
# (cpu time)
LimitCPU=infinity
# (virtual memory size)
LimitAS=infinity
# (open files)
LimitNOFILE=990000
# (processes/threads)
LimitNPROC=495000

最终解决了这个问题(记得用sudo systemctl daemon-reload && systemctl restart mongodb-01.service重新启动systemctl)。你可以通过

监控mongo进程的进度(我的是一个临时的空间饥饿聚合)
$ while true; do echo $(find /var/lib/mongodb_01/_tmp/ | wc -l); sleep 1; done