Question

最近我们发现Linux服务器上的JBoss进程由于内存消耗很高（约2.3 GB）而被操作系统关闭。这是转储：

RPC: fragment too large: 0x00800103
RPC: multiple fragments per record not supported
RPC: fragment too large: 0x00800103
RPC: multiple fragments per record not supported
RPC: fragment too large: 0x00800103
RPC: multiple fragments per record not supported
RPC: fragment too large: 0x00800103
RPC: multiple fragments per record not supported
RPC: fragment too large: 0x00800103
RPC: multiple fragments per record not supported
RPC: fragment too large: 0x00800103
RPC: multiple fragments per record not supported
RPC: fragment too large: 0x00800103
RPC: multiple fragments per record not supported
java invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
java cpuset=/ mems_allowed=0
Pid: 11445, comm: java Not tainted 2.6.32-431.el6.x86_64 #1
Call Trace:
 [<ffffffff810d05b1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
 [<ffffffff81122960>] ? dump_header+0x90/0x1b0
 [<ffffffff8122798c>] ? security_real_capable_noaudit+0x3c/0x70
 [<ffffffff81122de2>] ? oom_kill_process+0x82/0x2a0
 [<ffffffff81122d21>] ? select_bad_process+0xe1/0x120
 [<ffffffff81123220>] ? out_of_memory+0x220/0x3c0
 [<ffffffff8112fb3c>] ? __alloc_pages_nodemask+0x8ac/0x8d0
 [<ffffffff81167a9a>] ? alloc_pages_current+0xaa/0x110
 [<ffffffff8111fd57>] ? __page_cache_alloc+0x87/0x90
 [<ffffffff8111f73e>] ? find_get_page+0x1e/0xa0
 [<ffffffff81120cf7>] ? filemap_fault+0x1a7/0x500
 [<ffffffff8114a084>] ? __do_fault+0x54/0x530
 [<ffffffff810afa17>] ? futex_wait+0x227/0x380
 [<ffffffff8114a657>] ? handle_pte_fault+0xf7/0xb00
 [<ffffffff8114b28a>] ? handle_mm_fault+0x22a/0x300
 [<ffffffff8104a8d8>] ? __do_page_fault+0x138/0x480
 [<ffffffff81527910>] ? thread_return+0x4e/0x76e
 [<ffffffff8152d45e>] ? do_page_fault+0x3e/0xa0
 [<ffffffff8152a815>] ? page_fault+0x25/0x30
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd: 178
CPU    1: hi:  186, btch:  31 usd:  30
Node 0 Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd: 174
CPU    1: hi:  186, btch:  31 usd: 194
active_anon:113513 inactive_anon:184789 isolated_anon:0
 active_file:21 inactive_file:0 isolated_file:0
 unevictable:0 dirty:10 writeback:0 unstable:0
 free:17533 slab_reclaimable:4706 slab_unreclaimable:8059
 mapped:64 shmem:4 pagetables:3064 bounce:0
Node 0 DMA free:15696kB min:248kB low:308kB high:372kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15300kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3000 4010 4010
Node 0 DMA32 free:41740kB min:50372kB low:62964kB high:75556kB active_anon:200648kB inactive_anon:216504kB active_file:20kB inactive_file:52kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:8kB writeback:0kB mapped:168kB shmem:0kB slab_reclaimable:3720kB slab_unreclaimable:2476kB kernel_stack:512kB pagetables:516kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:108 all_unreclaimable? yes
lowmem_reserve[]: 0 0 1010 1010
Node 0 Normal free:12696kB min:16956kB low:21192kB high:25432kB active_anon:253404kB inactive_anon:522652kB active_file:64kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1034240kB mlocked:0kB dirty:32kB writeback:0kB mapped:88kB shmem:16kB slab_reclaimable:15104kB slab_unreclaimable:29760kB kernel_stack:3704kB pagetables:11740kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:146 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 4*4kB 2*8kB 3*16kB 4*32kB 2*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15696kB
Node 0 DMA32: 341*4kB 277*8kB 209*16kB 128*32kB 104*64kB 54*128kB 33*256kB 13*512kB 0*1024kB 1*2048kB 0*4096kB = 41740kB
Node 0 Normal: 2662*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 12696kB
64603 total pagecache pages
64549 pages in swap cache
Swap cache stats: add 3763837, delete 3699288, find 1606527/1870160
Free swap  = 0kB
Total swap = 1048568kB
1048560 pages RAM
67449 pages reserved
1061 pages shared
958817 pages non-shared
[ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[  419]     0   419     2662        1   1     -17         -1000 udevd
[  726]     0   726     2697        1   1     -17         -1000 udevd
[ 1021]     0  1021     4210       40   1       0             0 vmware-guestd
[ 1238]     0  1238    23294       28   1     -17         -1000 auditd
[ 1254]    65  1254   112744      203   1       0             0 nslcd
[ 1267]     0  1267    62271      123   1       0             0 rsyslogd
[ 1279]     0  1279     2705       32   1       0             0 irqbalance
[ 1293]    32  1293     4744       16   1       0             0 rpcbind
[ 1311]    29  1311     5837        2   0       0             0 rpc.statd
[ 1422]    81  1422     5874       36   0       0             0 dbus-daemon
[ 1451]     0  1451     1020        1   0       0             0 acpid
[ 1460]    68  1460     9995      129   0       0             0 hald
[ 1461]     0  1461     5082        2   1       0             0 hald-runner
[ 1490]     0  1490     5612        2   1       0             0 hald-addon-inpu
[ 1503]    68  1503     4484        2   0       0             0 hald-addon-acpi
[ 1523]     0  1523   134268       53   0       0             0 automount
[ 1540]     0  1540     1566        1   0       0             0 mcelog
[ 1552]     0  1552    16651       27   1     -17         -1000 sshd
[ 1560]     0  1560     5545       26   0       0             0 xinetd
[ 1568]    38  1568     8202       33   0       0             0 ntpd
[ 1584]     0  1584    21795       56   0       0             0 sendmail
[ 1592]    51  1592    19658       32   0       0             0 sendmail
[ 1601]     0  1601    29324       21   1       0             0 crond
[ 1612]     0  1612     5385        5   1       0             0 atd
[ 1638]     0  1638     1016        2   0       0             0 mingetty
[ 1640]     0  1640     1016        2   1       0             0 mingetty
[ 1642]     0  1642     1016        2   0       0             0 mingetty
[ 1644]     0  1644     2661        1   1     -17         -1000 udevd
[ 1645]     0  1645     1016        2   0       0             0 mingetty
[ 1647]     0  1647     1016        2   1       0             0 mingetty
[ 1649]     0  1649     1016        2   1       0             0 mingetty
[25003]     0 25003    26827        1   1       0             0 rpc.rquotad
[25007]     0 25007     5440        2   1       0             0 rpc.mountd
[25045]     0 25045     5773        2   1       0             0 rpc.idmapd
[31756]     0 31756    43994       12   0       0             0 httpd
[31758]    48 31758    45035      205   0       0             0 httpd
[31759]    48 31759    45035      210   1       0             0 httpd
[31760]    48 31760    45035      201   1       0             0 httpd
[31761]    48 31761    45068      211   1       0             0 httpd
[31762]    48 31762    45068      199   0       0             0 httpd
[31763]    48 31763    45035      196   0       0             0 httpd
[31764]    48 31764    45068      191   1       0             0 httpd
[31765]    48 31765    45035      206   1       0             0 httpd
[ 1893]     0  1893    41344        2   0       0             0 su
[ 1896]   500  1896    26525        2   0       0             0 standalone.sh
[ 1957]   500  1957   570217    81589   0       0             0 java
[10739]     0 10739    41344        2   0       0             0 su
[10742]   500 10742    26525        2   0       0             0 standalone.sh
[10805]   500 10805   576358    77163   0       0             0 java
[13378]     0 13378    41344        2   0       0             0 su
[13381]   500 13381    26525        2   1       0             0 standalone.sh
[13442]   500 13442   561881    73430   1       0             0 java
Out of memory: Kill process 10805 (java) score 141 or sacrifice child
Killed process 10805, UID 500, (java) total-vm:2305432kB, anon-rss:308648kB, file-rss:4kB

早上大约凌晨4点关闭它，除了Solr复制之外，服务器上没有用户也没有活动。这是主节点，它已经失败，我们的奴隶每分钟都会拨打它。这是复制配置：

<requestHandler name="/replication" class="solr.ReplicationHandler" >
       <lst name="master">
         <str name="enable">${solr.enable.master:false}</str>
         <str name="replicateAfter">commit</str>
         <str name="replicateAfter">startup</str>
         <str name="confFiles">schema.xml,stopwords.txt</str>
       </lst>
       <lst name="slave">
         <str name="enable">${solr.enable.slave:false}</str>
         <str name="masterUrl">${solr.master.url:http://localhost:8080/solr/cstb}</str>
         <str name="pollInterval">00:00:60</str>
       </lst>
     </requestHandler>

由于没有用户活动，索引没有变化，因此Solr实际上不应该做任何事情（我假设）。

配置文件中的其他一些值：

  <indexDefaults>
    <useCompoundFile>false</useCompoundFile>
    <mergeFactor>10</mergeFactor>
    <ramBufferSizeMB>32</ramBufferSizeMB>
    <maxFieldLength>10000</maxFieldLength>
    <writeLockTimeout>1000</writeLockTimeout>
    <lockType>native</lockType>
  </indexDefaults>

  <mainIndex>
    <useCompoundFile>false</useCompoundFile>
    <ramBufferSizeMB>32</ramBufferSizeMB>
    <mergeFactor>10</mergeFactor>
    <unlockOnStartup>false</unlockOnStartup>
    <reopenReaders>true</reopenReaders>
    <deletionPolicy class="solr.SolrDeletionPolicy">
      <str name="maxCommitsToKeep">1</str>
      <str name="maxOptimizedCommitsToKeep">0</str>
    </deletionPolicy>
    <infoStream file="INFOSTREAM.txt">false</infoStream>
  </mainIndex>

  <queryResultWindowSize>20</queryResultWindowSize>
  <queryResultMaxDocsCached>200</queryResultMaxDocsCached>

那么，有没有人遇到类似情况或有任何想法？我们正在使用Solr 3.5。

Answer 1

您正在遇到导致Linux终止高内存使用过程的低内存条件：

Out of memory: Kill process 10805 (java) score 141 or sacrifice child

这被称为内存不足或OOM。鉴于你只为JVM使用512MB堆（我认为对于任何大容量的生产Solr实例来说太低了）你没有很多选项，因为你无法减少堆以释放更多的OS内存。

你可以尝试的事情：

升级到具有更多内存的更大服务器。这将是我的第一推荐 - 你根本没有足够的内存。
将任何其他生产代码移至另一个系统。你没有如果你在这台服务器上运行了其他任何东西，我会的移动我能在别处做的任何事我怀疑在这里获得的不是很多你的系统很小，但每一点点都是帮助
尝试调整OOM杀手不那么严格 - 不是那么容易做到而且我不知道你会因为服务器总体规模小而获得什么，但你可以随时进行实验：

https://unix.stackexchange.com/questions/58872/how-to-set-oom-killer-adjustments-for-daemons-permanently

http://backdrift.org/how-to-create-oom-killer-exceptions

http://www.oracle.com/technetwork/articles/servers-storage-dev/oom-killer-1911807.html

Solr Replication泄漏了一些内存？

1 个答案: