为什么在网格激活时Ignite为JVM分配32GB的额外内部内存?

时间:2019-05-09 23:10:03

标签: ignite

您好,我们正在使用Apache Ignite 2.7(8个节点,每个120GB)并配置16GB堆和100GB数据区域(启用持久性)。使用本机内存跟踪,我们看到通常期望的类别(如堆,线程等)与预期的一样,但是“内部”(即,堆外)的容量高达132GB。这是JVM需要运行的所有其他功能的基础。 JVM发出如此大的内存请求,导致系统陷入内存不足的状况(操作系统内存不足)。

作为一个实验,我们将数据区域减少到1GB,并测量了网格激活之前和之后(网格由我们连接的客户端节点激活)的JVM内部内存使用情况。在网格激活时,我们看到内部(读取为:不安全的堆外)内存从62,154 KB跃升至32,897,187 KB。因此32GB的开销似乎与数据区域的大小无关。

这32GB的额外系统RAM使用量对我们来说是一个实际问题。为什么Ignite会这样做以及我们如何控制它?

谢谢

这是我们正在看到的典型本机内存摘要。注意巨大的内部分配。

  

本机内存总计:保留= 156688325KB,已提交= 156439245KB   -Java堆(reserved = 16777216KB,committed = 16777216KB)(mmap:reserved = 16777216KB,committed = 16777216KB)   -类(保留= 112257KB,已提交= 111489KB)(类#17951)(malloc = 1665KB#17624)(mmap:已保留= 110592KB,已提交= 109824KB)   -线程(reserved = 229015KB,committed = 229015KB)(thread#223)(堆栈:reserved = 228032KB,committed = 228032KB)(malloc = 723KB#1128)   (arena = 260KB#432)   -代码(保留= 255790KB,已提交= 40250KB)(malloc = 6190KB#11547)(mmap:保留= 249600KB,已提交= 34060KB)   -GC(保留= 704014KB,已提交= 704014KB)(malloc = 48654KB#22251)(mmap:已保留= 655360KB,已提交= 655360KB)   -编译器(保留= 420KB,已提交= 420KB)(malloc = 289KB#1284)(区域= 131KB#15)   -内部(保留= 138544815KB,已提交= 138544811KB)(malloc = 138544779KB#35177)(mmap:已保留= 36KB,已提交= 32KB)   -符号(保留= 26536KB,已提交= 26536KB)(malloc = 24002KB#216741)(领域= 2533KB#1)   -本机内存跟踪(保留= 4822KB,已提交= 4822KB)(malloc = 30KB#346)(跟踪开销= 4791KB)   -Arena Chunk(保留= 673K​​B,已提交= 673K​​B)(malloc = 673K​​B)   -未知(保留= 32768KB,已提交= 0KB)(mmap:保留= 32768KB,已提交= 0KB)

PS

我们将默认数据区域设置为128MB,将systemRegionMaxSize设置为8GB,将systemRegionInitialSize设置为512MB。

配置:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="         http://www.springframework.org/schema/beans         http://www.springframework.org/schema/beans/spring-beans.xsd">
  <bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
    <property name="gridLogger">
      <bean class="org.apache.ignite.logger.log4j.Log4JLogger">
        <constructor-arg type="java.lang.String" value="/opt/ignite/apache-ignite/config/log4j.xml"/>
      </bean>
    </property>
    <property name="metricsLogFrequency" value="600000"/>
    <property name="rebalanceThreadPoolSize" value="12"/>
    <property name="peerClassLoadingEnabled" value="true"/>
    <property name="publicThreadPoolSize" value="32"/>
    <property name="systemThreadPoolSize" value="32"/>
    <property name="workDirectory" value="/data/ignite/work"/>
    <property name="segmentationPolicy" value="RESTART_JVM"/>
    <property name="dataStorageConfiguration">
      <bean class="org.apache.ignite.configuration.DataStorageConfiguration">
        <property name="checkpointReadLockTimeout" value="0"/>
        <property name="systemRegionInitialSize" value="#{512L * 1024 * 1024}"/>
        <property name="systemRegionMaxSize" value="#{8L * 1024 * 1024 * 1024}"/>
        <property name="storagePath" value="/data/ignite/persistentStore"/>
        <property name="defaultDataRegionConfiguration">
          <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
            <property name="name" value="Default_Region"/>
            <property name="initialSize" value="67108864"/>
            <property name="maxSize" value="134217728"/>
            <property name="persistenceEnabled" value="false"/>
            <property name="metricsEnabled" value="true"/>
          </bean>
        </property>
        <property name="dataRegionConfigurations">
          <list>
            <bean class="org.apache.ignite.configuration.DataRegionConfiguration">
              <property name="name" value="Tiered_Region"/>
              <property name="initialSize" value="53687091200"/>
              <property name="maxSize" value="53687091200"/>
              <property name="persistenceEnabled" value="true"/>
              <property name="pageEvictionMode" value="RANDOM_2_LRU"/>
              <property name="evictionThreshold" value="0.75"/>
              <property name="metricsEnabled" value="true"/>
            </bean>
          </list>
        </property>
      </bean>
    </property>
    <property name="cacheConfiguration">
      <list>
        <bean class="org.apache.ignite.configuration.CacheConfiguration">
          <property name="name" value="default"/>
          <property name="atomicityMode" value="ATOMIC"/>
          <property name="backups" value="0"/>
        </bean>
      </list>
    </property>
    <property name="communicationSpi">
      <bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
        <property name="messageQueueLimit" value="#{1 * 1024}"/>
        <property name="idleConnectionTimeout" value="30000"/>
      </bean>
    </property>
    <property name="discoverySpi">
      <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
        <property name="ipFinder">
          <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.s3.TcpDiscoveryS3IpFinder">
            <property name="awsCredentials" ref="aws.creds"/>
            <property name="bucketName" value="project-test-xyz"/>
          </bean>
        </property>
      </bean>
    </property>
  </bean>
  <bean id="aws.creds" class="com.amazonaws.auth.BasicAWSCredentials">
    <constructor-arg value="foo"/>
    <constructor-arg value="bar"/>
  </bean>
</beans>

[在下面添加日志]

  

[2019-05-17 22:28:39,592] [WARN] [main] [IgniteKernal]对等课程   启用了加载(在生产环境中将其禁用以提高性能和   部署一致性原因)[2019-05-17 22:28:39,593] [WARN   ] [main] [IgniteKernal]请设置系统属性   '-Djava.net.preferIPv4Stack = true'避免混合时可能出现的问题   环境。 [2019-05-17 22:28:40,141] [警告   ] [main] [NoopCheckpointSpi]禁用检查点(以启用   配置任何GridCheckpointSpi实现)[2019-05-17   22:28:40,214] [WARN] [main] [GridCollisionManager]碰撞分辨率   被禁用(所有作业将在到达时被激活)。 [2019-05-17   22:28:41,690] [WARN] [main] [GridCacheDatabaseSharedManager]   改为使用DataRegionConfiguration.maxWalArchiveSize   DataRegionConfiguration.walHistorySize将用于删除旧的   存档wal文件[2019-05-17 22:28:41,826] [WARN   ] [main] [PartitionsEvictManager]以INFO级别登录而不进行检查   如果启用了INFO级别:逐出分区许可= 4 [2019-05-17   22:28:46,291] [WARN] [main] [IgniteKernal]节点在本地启动   机器需要超过80%的物理RAM,这可能导致   由于交换而导致的显着速度降低(请减小JVM堆大小,   数据区域大小或检查点缓冲区大小)[必需= 12516MB,   available = 14008MB] log4j:终结名为[null]的附加程序。   [2019-05-17 22:31:19,958] [警告   ] [disco-event-worker-#42] [GridDiscoveryManager]本地节点的值   'java.net.preferIPv4Stack'系统属性不同于远程节点的   (拓扑中的所有节点应具有相同的值)   [locPreferIpV4 = null,rmtPreferIpV4 = true,locId8 = f25228c0,   rmtId8 = eac4211d,rmtAddrs = [192.168.1.5/127.0.0.1,/192.168.1.5],   rmtNode = ClusterNode [id = eac4211d-c272-4eb0-9bd5-f91dfa34a0e9,order = 2,   addr = [127.0.0.1,192.168.1.5],daemon = false]] [2019-05-17   22:32:24,265] [WARN] [exchange-worker-#43] [GridAffinityAssignmentCache]   在INFO级别登录而不检查是否启用了INFO级别:本地   节点相似性分配分布不理想[cache = default,   ExpectedPrimary = 1024.00,actualPrimary = 1024,expectedBackups = 1024.00,   actualBackups = 0,warningThreshold = 50.00%] [2019-05-17   22:32:24,269] [WARN] [exchange-worker-#43] [GridAffinityAssignmentCache]   在INFO级别登录而不检查是否启用了INFO级别:本地   节点相似性分配分布不理想[cache = default,   ExpectedPrimary = 1024.00,actualPrimary = 1024,expectedBackups = 1024.00,   actualBackups = 0,warningThreshold = 50.00%] [2019-05-17   22:32:24,850] [WARN] [exchange-worker-#43] [GridAffinityAssignmentCache]   在INFO级别登录而不检查是否启用了INFO级别:本地   节点相似性分配分布不理想[cache = default,   ExpectedPrimary = 1024.00,actualPrimary = 1024,expectedBackups = 1024.00,   actualBackups = 0,warningThreshold = 50.00%] [2019-05-17   22:32:24,911] [WARN   ] [disco-notifier-worker-#41] [GridClusterStateProcessor]记录在   INFO级别,不检查是否启用了INFO级别:接收状态   更改完成消息:true 22:33:49.086 [exchange-worker-#43]信息   c.b.aa.ceres.loader.S3CacheLoader-加载   eb5445c7-d7fa-4018-95b6-63c4a0911eae收到注入点燃实例   IgniteKernal [longJVMPauseDetector =长JVMPauseDetector   [workerRef = Thread [jvm-pause-detector-worker,5,main],longPausesCnt = 0,   longPausesTotalDuration = 0,longPausesTimestamps = [0,0,0,0,0,0,0,   0、0、0、0、0、0、0、0、0、0、0、0、0],longPausesDurations = [0、0、0,   0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]],   cfg = IgniteConfiguration [igniteInstanceName = null,pubPoolSize = 32,   svcPoolSize = 32,callbackPoolSize = 8,strandedPoolSize = 8,sysPoolSize = 16,   mgmtPoolSize = 4,igfsPoolSize = 8,dataStreamerPoolSize = 8,   utilityCachePoolSize = 8,utilityCacheKeepAliveTime = 60000,   p2pPoolSize = 2,qryPoolSize = 8,igniteHome = / opt / ignite / apache-ignite,   igniteWorkDir = / data / ignite / work,   mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6f94fa3e,   nodeId = f25228c0-afbc-4626-990a-68f97fd5b258,marsh = BinaryMarshaller   [],marshLocJobs = false,daemon = false,p2pEnabled = true,   netTimeout = 5000,sndRetryDelay = 1000,sndRetryCnt = 3,   metricsHistSize = 10000,metricsUpdateFreq = 2000,   metricsExpTime = 9223372036854775807,discoSpi = TcpDiscoverySpi   [addrRslvr = null,sockTimeout = 5000,ackTimeout = 5000,   沼泽=杰克·马歇尔   [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@44a3f602],   reconCnt = 10,reconDelay = 2000,maxAckTimeout = 600000,   forceSrvMode = false,clientReconnectDisabled = false,internalLsnr = null],   segPlc = NOOP,segResolveAttempts = 2,waitForSegOnStart = true,   allResolversPassReq = true,segChkFreq = 60000,   commSpi = TcpCommunicationSpi   [connectGate=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$ConnectGateway@6020964a,   connPlc=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$FirstConnectionPolicy@3f2874d5,   enableForcibleNodeKill = false,enableTroubleshootingLog = true,   locAddr = null,locHost = 0.0.0.0 / 0.0.0.0,locPort = 47100,   locPortRange = 100,shmemPort = -1,directBuf = true,directSndBuf = false,   idleConnTimeout = 600000,connTimeout = 5000,maxConnTimeout = 600000,   reconCnt = 10,sockSndBuf = 32768,sockRcvBuf = 32768,msgQueueLimit = 1024,   slowClientQueueLimit = 0,nioSrvr = GridNioServer [selectorSpins = 0,   filterChain = FilterChain [filters = [GridNioCodecFilter   [parser=org.apache.ignite.internal.util.nio.GridDirectParser@7873ad1,   directMode = true],GridConnectionBytesVerifyFilter],closed = false,   directBuf = true,tcpNoDelay = true,sockSndBuf = 32768,sockRcvBuf = 32768,   writeTimeout = 2000,idleTimeout = 600000,skipWrite = false,   skipRead = false,locAddr = 0.0.0.0 / 0.0.0.0:47100,order = LITTLE_ENDIAN,   sndQueueLimit = 1024,directMode = true,sslFilter = null,   msgQueueLsnr = null,readerMoveCnt = 0,writerMoveCnt = 0,   readWriteSelectorsAssign = false],shmemSrv = null,   usePairedConnections = false,connectionsPerNode = 1,tcpNoDelay = true,   filterReachableAddresses = false,ackSndThreshold = 32,   unackedMsgsBufSize = 0,sockWriteTimeout = 2000,boundTcpPort = 47100,   boundTcpShmemPort = -1,selectorsCnt = 4,selectorSpins = 0,addrRslvr = null,   ctxInitLatch=java.util.concurrent.CountDownLatch@7b757828 [Count = 0],   停止=假],   evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@282cb7c7,   colSpi = NoopCollisionSpi [],deploySpi = LocalDeploymentSpi [],   indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@50de186c,   addrRslvr = null,   encryptionSpi=org.apache.ignite.spi.encryption.noop.NoopEncryptionSpi@5a3bc7ed,   clientMode = false,rebalanceThreadPoolSize = 1,   txCfg = TransactionConfiguration [txSerEnabled = false,   dfltIsolation = REPEATABLE_READ,dfltConcurrency = PESSIMISTIC,   dfltTxTimeout = 0,txTimeoutOnPartitionMapExchange = 0,   pessimisticTxLogSize = 0,pessimisticTxLogLinger = 10000,   tmLookupClsName = null,txManagerFactory = null,useJtaSync = false],   cacheSanityCheckEnabled = true,discoStartupDelay = 60000,   deployMode =共享,p2pMissedCacheSize = 100,locHost = null,   timeSrvPortBase = 31100,timeSrvPortRange = 100,   failureDetectionTimeout = 60000,sysWorkerBlockedTimeout = null,   clientFailureDetectionTimeout = 30000,metricsLogFreq = 60000,   hadoopCfg = null,connectorCfg = ConnectorConfiguration [jettyPath = null,   host = null,port = 11211,noDelay = true,directBuf = false,   sndBufSize = 32768,rcvBufSize = 32768,idleQryCurTimeout = 600000,   idleQryCurCheckFreq = 60000,sndQueueLimit = 0,selectorCnt = 4,   idleTimeout = 7000,sslEnabled = false,sslClientAuth = false,   sslCtxFactory = null,sslFactory = null,portRange = 100,threadPoolSize = 8,   msgInterceptor = null],odbcCfg = null,warmupClos = null,   atomicCfg = AtomicConfiguration [seqReserveSize = 1000,   cacheMode = PARTITIONED,备份= 1,aff = null,grpName = null],   classLdr = null,sslCtxFactory = null,platformCfg = null,binaryCfg = null,   memCfg = null,pstCfg = null,dsCfg = DataStorageConfiguration   [sysRegionInitSize = 41943040,sysRegionMaxSize = 104857600,   pageSize = 1024,concLvl = 0,dfltDataRegConf = DataRegionConfiguration   [name = Default_Region,maxSize = 134217728,initSize = 67108864,   swapPath = null,pageEvictionMode = DISABLED,evictionThreshold = 0.9,   emptyPagesPoolSize = 100,metricsEnabled = true,   metricsSubIntervalCount = 5,metricsRateTimeInterval = 60000,   persistenceEnabled = false,checkpointPageBufSize = 0],   dataRegions = [DataRegionConfiguration [name = Tiered_Region,   maxSize = 8589934592,initSize = 8589934592,swapPath = null,   pageEvictionMode = DISABLED,evictionThreshold = 0.9,   emptyPagesPoolSize = 100,metricsEnabled = true,   metricsSubIntervalCount = 5,metricsRateTimeInterval = 60000,   persistenceEnabled = true,checkpointPageBufSize = 0]],   storagePath = / data / ignite / persistentStore,checkpointFreq = 180000,   lockWaitTime = 30000,checkpointThreads = 8,   checkpointWriteOrder = SEQUENTIAL,walHistSize = 20,   maxWalArchiveSize = 1073741824,walSegments = 10,walSegmentSize = 67108864,   walPath = db / wal,walArchivePath = db / wal / archive,metricsEnabled = false,   walMode = LOG_ONLY,walTlbSize = 131072,walBuffSize = 0,walFlushFreq = 2000,   walFsyncDelay = 1000,walRecordIterBuffSize = 67108864,   alwaysWriteFullPages = false,   fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory@2fb68ec6,   metricsSubIntervalCnt = 5,metricsRateTimeInterval = 60000,   walAutoArchiveAfterInactivity = -1,writeThrottlingEnabled = true,   walCompactionEnabled = false,walCompactionLevel = 1,   checkpointReadLockTimeout = null],activeOnStart = true,   autoActivation = true,longQryWarnTimeout = 3000,sqlConnCfg = null,   cliConnCfg = ClientConnectorConfiguration [主机=空,端口= 10800,   portRange = 100,sockSndBufSize = 0,sockRcvBufSize = 0,tcpNoDelay = true,   maxOpenCursorsPerConn = 128,threadPoolSize = 8,idleTimeout = 0,   jdbcEnabled = true,odbcEnabled = true,thinCliEnabled = true,   sslEnabled = false,useIgniteSslCtxFactory = true,sslClientAuth = false,   sslCtxFactory = null],mvccVacuumThreadCnt = 2,mvccVacuumFreq = 5000,   authEnabled = false,failureHnd = null,commFailureRslvr = null],   igniteInstanceName = null,startTime = 1558132126418,   rsrcCtx=org.apache.ignite.internal.processors.resource.GridSpringResourceContextImpl@556d0e12,   reconnectState = ReconnectState [firstReconnectFut = GridFutureAdapter   [ignoreInterrupts = false,state = INIT,res = null,hash = 1426466647],   curReconnectFut = null,reconnectDone = null]]

1 个答案:

答案 0 :(得分:1)

我想的是Checkpoint Page Buffer,默认情况下为数据区域大小的20%。

您可以明确指定它以确保您不会忘记它,并相应地减小区域大小以确保您不会用完RAM。

应仅适用于持久性区域。

请注意,您还应该期望OS为其数据结构和块缓存占用几GB,因此我不认为您应该为Ignite的Off-Heap分配116G的120G内存。也不要忘记堆。