最近我们发现Linux服务器上的JBoss进程由于内存消耗很高(约2.3 GB)而被操作系统关闭。这是转储:
RPC: fragment too large: 0x00800103
RPC: multiple fragments per record not supported
RPC: fragment too large: 0x00800103
RPC: multiple fragments per record not supported
RPC: fragment too large: 0x00800103
RPC: multiple fragments per record not supported
RPC: fragment too large: 0x00800103
RPC: multiple fragments per record not supported
RPC: fragment too large: 0x00800103
RPC: multiple fragments per record not supported
RPC: fragment too large: 0x00800103
RPC: multiple fragments per record not supported
RPC: fragment too large: 0x00800103
RPC: multiple fragments per record not supported
java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
java cpuset=/ mems_allowed=0
Pid: 11445, comm: java Not tainted 2.6.32-431.el6.x86_64 #1
Call Trace:
[<ffffffff810d05b1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
[<ffffffff81122960>] ? dump_header+0x90/0x1b0
[<ffffffff8122798c>] ? security_real_capable_noaudit+0x3c/0x70
[<ffffffff81122de2>] ? oom_kill_process+0x82/0x2a0
[<ffffffff81122d21>] ? select_bad_process+0xe1/0x120
[<ffffffff81123220>] ? out_of_memory+0x220/0x3c0
[<ffffffff8112fb3c>] ? __alloc_pages_nodemask+0x8ac/0x8d0
[<ffffffff81167a9a>] ? alloc_pages_current+0xaa/0x110
[<ffffffff8111fd57>] ? __page_cache_alloc+0x87/0x90
[<ffffffff8111f73e>] ? find_get_page+0x1e/0xa0
[<ffffffff81120cf7>] ? filemap_fault+0x1a7/0x500
[<ffffffff8114a084>] ? __do_fault+0x54/0x530
[<ffffffff810afa17>] ? futex_wait+0x227/0x380
[<ffffffff8114a657>] ? handle_pte_fault+0xf7/0xb00
[<ffffffff8114b28a>] ? handle_mm_fault+0x22a/0x300
[<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480
[<ffffffff81527910>] ? thread_return+0x4e/0x76e
[<ffffffff8152d45e>] ? do_page_fault+0x3e/0xa0
[<ffffffff8152a815>] ? page_fault+0x25/0x30
Mem-Info:
Node 0 DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
CPU 1: hi: 0, btch: 1 usd: 0
Node 0 DMA32 per-cpu:
CPU 0: hi: 186, btch: 31 usd: 178
CPU 1: hi: 186, btch: 31 usd: 30
Node 0 Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 174
CPU 1: hi: 186, btch: 31 usd: 194
active_anon:113513 inactive_anon:184789 isolated_anon:0
active_file:21 inactive_file:0 isolated_file:0
unevictable:0 dirty:10 writeback:0 unstable:0
free:17533 slab_reclaimable:4706 slab_unreclaimable:8059
mapped:64 shmem:4 pagetables:3064 bounce:0
Node 0 DMA free:15696kB min:248kB low:308kB high:372kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15300kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3000 4010 4010
Node 0 DMA32 free:41740kB min:50372kB low:62964kB high:75556kB active_anon:200648kB inactive_anon:216504kB active_file:20kB inactive_file:52kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:8kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:3720kB slab_unreclaimable:2476kB kernel_stack:512kB pagetables:516kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:108 all_unreclaimable? yes
lowmem_reserve[]: 0 0 1010 1010
Node 0 Normal free:12696kB min:16956kB low:21192kB high:25432kB active_anon:253404kB inactive_anon:522652kB active_file:64kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1034240kB mlocked:0kB dirty:32kB writeback:0kB mapped:88kB shmem:16kB slab_reclaimable:15104kB slab_unreclaimable:29760kB kernel_stack:3704kB pagetables:11740kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:146 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 4*4kB 2*8kB 3*16kB 4*32kB 2*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15696kB
Node 0 DMA32: 341*4kB 277*8kB 209*16kB 128*32kB 104*64kB 54*128kB 33*256kB 13*512kB 0*1024kB 1*2048kB 0*4096kB = 41740kB
Node 0 Normal: 2662*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 12696kB
64603 total pagecache pages
64549 pages in swap cache
Swap cache stats: add 3763837, delete 3699288, find 1606527/1870160
Free swap = 0kB
Total swap = 1048568kB
1048560 pages RAM
67449 pages reserved
1061 pages shared
958817 pages non-shared
[ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
[ 419] 0 419 2662 1 1 -17 -1000 udevd
[ 726] 0 726 2697 1 1 -17 -1000 udevd
[ 1021] 0 1021 4210 40 1 0 0 vmware-guestd
[ 1238] 0 1238 23294 28 1 -17 -1000 auditd
[ 1254] 65 1254 112744 203 1 0 0 nslcd
[ 1267] 0 1267 62271 123 1 0 0 rsyslogd
[ 1279] 0 1279 2705 32 1 0 0 irqbalance
[ 1293] 32 1293 4744 16 1 0 0 rpcbind
[ 1311] 29 1311 5837 2 0 0 0 rpc.statd
[ 1422] 81 1422 5874 36 0 0 0 dbus-daemon
[ 1451] 0 1451 1020 1 0 0 0 acpid
[ 1460] 68 1460 9995 129 0 0 0 hald
[ 1461] 0 1461 5082 2 1 0 0 hald-runner
[ 1490] 0 1490 5612 2 1 0 0 hald-addon-inpu
[ 1503] 68 1503 4484 2 0 0 0 hald-addon-acpi
[ 1523] 0 1523 134268 53 0 0 0 automount
[ 1540] 0 1540 1566 1 0 0 0 mcelog
[ 1552] 0 1552 16651 27 1 -17 -1000 sshd
[ 1560] 0 1560 5545 26 0 0 0 xinetd
[ 1568] 38 1568 8202 33 0 0 0 ntpd
[ 1584] 0 1584 21795 56 0 0 0 sendmail
[ 1592] 51 1592 19658 32 0 0 0 sendmail
[ 1601] 0 1601 29324 21 1 0 0 crond
[ 1612] 0 1612 5385 5 1 0 0 atd
[ 1638] 0 1638 1016 2 0 0 0 mingetty
[ 1640] 0 1640 1016 2 1 0 0 mingetty
[ 1642] 0 1642 1016 2 0 0 0 mingetty
[ 1644] 0 1644 2661 1 1 -17 -1000 udevd
[ 1645] 0 1645 1016 2 0 0 0 mingetty
[ 1647] 0 1647 1016 2 1 0 0 mingetty
[ 1649] 0 1649 1016 2 1 0 0 mingetty
[25003] 0 25003 26827 1 1 0 0 rpc.rquotad
[25007] 0 25007 5440 2 1 0 0 rpc.mountd
[25045] 0 25045 5773 2 1 0 0 rpc.idmapd
[31756] 0 31756 43994 12 0 0 0 httpd
[31758] 48 31758 45035 205 0 0 0 httpd
[31759] 48 31759 45035 210 1 0 0 httpd
[31760] 48 31760 45035 201 1 0 0 httpd
[31761] 48 31761 45068 211 1 0 0 httpd
[31762] 48 31762 45068 199 0 0 0 httpd
[31763] 48 31763 45035 196 0 0 0 httpd
[31764] 48 31764 45068 191 1 0 0 httpd
[31765] 48 31765 45035 206 1 0 0 httpd
[ 1893] 0 1893 41344 2 0 0 0 su
[ 1896] 500 1896 26525 2 0 0 0 standalone.sh
[ 1957] 500 1957 570217 81589 0 0 0 java
[10739] 0 10739 41344 2 0 0 0 su
[10742] 500 10742 26525 2 0 0 0 standalone.sh
[10805] 500 10805 576358 77163 0 0 0 java
[13378] 0 13378 41344 2 0 0 0 su
[13381] 500 13381 26525 2 1 0 0 standalone.sh
[13442] 500 13442 561881 73430 1 0 0 java
Out of memory: Kill process 10805 (java) score 141 or sacrifice child
Killed process 10805, UID 500, (java) total-vm:2305432kB, anon-rss:308648kB, file-rss:4kB
早上大约凌晨4点关闭它,除了Solr复制之外,服务器上没有用户也没有活动。这是主节点,它已经失败,我们的奴隶每分钟都会拨打它。这是复制配置:
<requestHandler name="/replication" class="solr.ReplicationHandler" >
<lst name="master">
<str name="enable">${solr.enable.master:false}</str>
<str name="replicateAfter">commit</str>
<str name="replicateAfter">startup</str>
<str name="confFiles">schema.xml,stopwords.txt</str>
</lst>
<lst name="slave">
<str name="enable">${solr.enable.slave:false}</str>
<str name="masterUrl">${solr.master.url:http://localhost:8080/solr/cstb}</str>
<str name="pollInterval">00:00:60</str>
</lst>
</requestHandler>
由于没有用户活动,索引没有变化,因此Solr实际上不应该做任何事情(我假设)。
配置文件中的其他一些值:
<indexDefaults>
<useCompoundFile>false</useCompoundFile>
<mergeFactor>10</mergeFactor>
<ramBufferSizeMB>32</ramBufferSizeMB>
<maxFieldLength>10000</maxFieldLength>
<writeLockTimeout>1000</writeLockTimeout>
<lockType>native</lockType>
</indexDefaults>
<mainIndex>
<useCompoundFile>false</useCompoundFile>
<ramBufferSizeMB>32</ramBufferSizeMB>
<mergeFactor>10</mergeFactor>
<unlockOnStartup>false</unlockOnStartup>
<reopenReaders>true</reopenReaders>
<deletionPolicy class="solr.SolrDeletionPolicy">
<str name="maxCommitsToKeep">1</str>
<str name="maxOptimizedCommitsToKeep">0</str>
</deletionPolicy>
<infoStream file="INFOSTREAM.txt">false</infoStream>
</mainIndex>
<queryResultWindowSize>20</queryResultWindowSize>
<queryResultMaxDocsCached>200</queryResultMaxDocsCached>
那么,有没有人遇到类似情况或有任何想法?我们正在使用Solr 3.5。
答案 0 :(得分:1)
您正在遇到导致Linux终止高内存使用过程的低内存条件:
Out of memory: Kill process 10805 (java) score 141 or sacrifice child
这被称为内存不足或OOM。鉴于你只为JVM使用512MB堆(我认为对于任何大容量的生产Solr实例来说太低了)你没有很多选项,因为你无法减少堆以释放更多的OS内存。
你可以尝试的事情:
升级到具有更多内存的更大服务器。这将是我的第一推荐 - 你根本没有足够的内存。
将任何其他生产代码移至另一个系统。你没有 如果你在这台服务器上运行了其他任何东西,我会的 移动我能在别处做的任何事我怀疑在这里获得的不是很多 你的系统很小,但每一点点都是 帮助
尝试调整OOM杀手不那么严格 - 不是那么容易做到而且我不知道你会因为服务器总体规模小而获得什么,但你可以随时进行实验:
http://backdrift.org/how-to-create-oom-killer-exceptions
http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html