您好,我们正在使用Apache Ignite 2.7(8个节点,每个120GB)并配置16GB堆和100GB数据区域(启用持久性)。使用本机内存跟踪,我们看到通常期望的类别(如堆,线程等)与预期的一样,但是“内部”(即,堆外)的容量高达132GB。这是JVM需要运行的所有其他功能的基础。 JVM发出如此大的内存请求,导致系统陷入内存不足的状况(操作系统内存不足)。
作为一个实验,我们将数据区域减少到1GB,并测量了网格激活之前和之后(网格由我们连接的客户端节点激活)的JVM内部内存使用情况。在网格激活时,我们看到内部(读取为:不安全的堆外)内存从62,154 KB跃升至32,897,187 KB。因此32GB的开销似乎与数据区域的大小无关。
这32GB的额外系统RAM使用量对我们来说是一个实际问题。为什么Ignite会这样做以及我们如何控制它?
谢谢
这是我们正在看到的典型本机内存摘要。注意巨大的内部分配。
本机内存总计:保留= 156688325KB,已提交= 156439245KB -Java堆(reserved = 16777216KB,committed = 16777216KB)(mmap:reserved = 16777216KB,committed = 16777216KB) -类(保留= 112257KB,已提交= 111489KB)(类#17951)(malloc = 1665KB#17624)(mmap:已保留= 110592KB,已提交= 109824KB) -线程(reserved = 229015KB,committed = 229015KB)(thread#223)(堆栈:reserved = 228032KB,committed = 228032KB)(malloc = 723KB#1128) (arena = 260KB#432) -代码(保留= 255790KB,已提交= 40250KB)(malloc = 6190KB#11547)(mmap:保留= 249600KB,已提交= 34060KB) -GC(保留= 704014KB,已提交= 704014KB)(malloc = 48654KB#22251)(mmap:已保留= 655360KB,已提交= 655360KB) -编译器(保留= 420KB,已提交= 420KB)(malloc = 289KB#1284)(区域= 131KB#15) -内部(保留= 138544815KB,已提交= 138544811KB)(malloc = 138544779KB#35177)(mmap:已保留= 36KB,已提交= 32KB) -符号(保留= 26536KB,已提交= 26536KB)(malloc = 24002KB#216741)(领域= 2533KB#1) -本机内存跟踪(保留= 4822KB,已提交= 4822KB)(malloc = 30KB#346)(跟踪开销= 4791KB) -Arena Chunk(保留= 673KB,已提交= 673KB)(malloc = 673KB) -未知(保留= 32768KB,已提交= 0KB)(mmap:保留= 32768KB,已提交= 0KB)
PS
我们将默认数据区域设置为128MB,将systemRegionMaxSize设置为8GB,将systemRegionInitialSize设置为512MB。
配置:
<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">
<bean id="ignite.cfg" class="org.apache.ignite.configuration.IgniteConfiguration">
<property name="gridLogger">
<bean class="org.apache.ignite.logger.log4j.Log4JLogger">
<constructor-arg type="java.lang.String" value="/opt/ignite/apache-ignite/config/log4j.xml"/>
</bean>
</property>
<property name="metricsLogFrequency" value="600000"/>
<property name="rebalanceThreadPoolSize" value="12"/>
<property name="peerClassLoadingEnabled" value="true"/>
<property name="publicThreadPoolSize" value="32"/>
<property name="systemThreadPoolSize" value="32"/>
<property name="workDirectory" value="/data/ignite/work"/>
<property name="segmentationPolicy" value="RESTART_JVM"/>
<property name="dataStorageConfiguration">
<bean class="org.apache.ignite.configuration.DataStorageConfiguration">
<property name="checkpointReadLockTimeout" value="0"/>
<property name="systemRegionInitialSize" value="#{512L * 1024 * 1024}"/>
<property name="systemRegionMaxSize" value="#{8L * 1024 * 1024 * 1024}"/>
<property name="storagePath" value="/data/ignite/persistentStore"/>
<property name="defaultDataRegionConfiguration">
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="name" value="Default_Region"/>
<property name="initialSize" value="67108864"/>
<property name="maxSize" value="134217728"/>
<property name="persistenceEnabled" value="false"/>
<property name="metricsEnabled" value="true"/>
</bean>
</property>
<property name="dataRegionConfigurations">
<list>
<bean class="org.apache.ignite.configuration.DataRegionConfiguration">
<property name="name" value="Tiered_Region"/>
<property name="initialSize" value="53687091200"/>
<property name="maxSize" value="53687091200"/>
<property name="persistenceEnabled" value="true"/>
<property name="pageEvictionMode" value="RANDOM_2_LRU"/>
<property name="evictionThreshold" value="0.75"/>
<property name="metricsEnabled" value="true"/>
</bean>
</list>
</property>
</bean>
</property>
<property name="cacheConfiguration">
<list>
<bean class="org.apache.ignite.configuration.CacheConfiguration">
<property name="name" value="default"/>
<property name="atomicityMode" value="ATOMIC"/>
<property name="backups" value="0"/>
</bean>
</list>
</property>
<property name="communicationSpi">
<bean class="org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi">
<property name="messageQueueLimit" value="#{1 * 1024}"/>
<property name="idleConnectionTimeout" value="30000"/>
</bean>
</property>
<property name="discoverySpi">
<bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
<property name="ipFinder">
<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.s3.TcpDiscoveryS3IpFinder">
<property name="awsCredentials" ref="aws.creds"/>
<property name="bucketName" value="project-test-xyz"/>
</bean>
</property>
</bean>
</property>
</bean>
<bean id="aws.creds" class="com.amazonaws.auth.BasicAWSCredentials">
<constructor-arg value="foo"/>
<constructor-arg value="bar"/>
</bean>
</beans>
[在下面添加日志]
[2019-05-17 22:28:39,592] [WARN] [main] [IgniteKernal]对等课程 启用了加载(在生产环境中将其禁用以提高性能和 部署一致性原因)[2019-05-17 22:28:39,593] [WARN ] [main] [IgniteKernal]请设置系统属性 '-Djava.net.preferIPv4Stack = true'避免混合时可能出现的问题 环境。 [2019-05-17 22:28:40,141] [警告 ] [main] [NoopCheckpointSpi]禁用检查点(以启用 配置任何GridCheckpointSpi实现)[2019-05-17 22:28:40,214] [WARN] [main] [GridCollisionManager]碰撞分辨率 被禁用(所有作业将在到达时被激活)。 [2019-05-17 22:28:41,690] [WARN] [main] [GridCacheDatabaseSharedManager] 改为使用DataRegionConfiguration.maxWalArchiveSize DataRegionConfiguration.walHistorySize将用于删除旧的 存档wal文件[2019-05-17 22:28:41,826] [WARN ] [main] [PartitionsEvictManager]以INFO级别登录而不进行检查 如果启用了INFO级别:逐出分区许可= 4 [2019-05-17 22:28:46,291] [WARN] [main] [IgniteKernal]节点在本地启动 机器需要超过80%的物理RAM,这可能导致 由于交换而导致的显着速度降低(请减小JVM堆大小, 数据区域大小或检查点缓冲区大小)[必需= 12516MB, available = 14008MB] log4j:终结名为[null]的附加程序。 [2019-05-17 22:31:19,958] [警告 ] [disco-event-worker-#42] [GridDiscoveryManager]本地节点的值 'java.net.preferIPv4Stack'系统属性不同于远程节点的 (拓扑中的所有节点应具有相同的值) [locPreferIpV4 = null,rmtPreferIpV4 = true,locId8 = f25228c0, rmtId8 = eac4211d,rmtAddrs = [192.168.1.5/127.0.0.1,/192.168.1.5], rmtNode = ClusterNode [id = eac4211d-c272-4eb0-9bd5-f91dfa34a0e9,order = 2, addr = [127.0.0.1,192.168.1.5],daemon = false]] [2019-05-17 22:32:24,265] [WARN] [exchange-worker-#43] [GridAffinityAssignmentCache] 在INFO级别登录而不检查是否启用了INFO级别:本地 节点相似性分配分布不理想[cache = default, ExpectedPrimary = 1024.00,actualPrimary = 1024,expectedBackups = 1024.00, actualBackups = 0,warningThreshold = 50.00%] [2019-05-17 22:32:24,269] [WARN] [exchange-worker-#43] [GridAffinityAssignmentCache] 在INFO级别登录而不检查是否启用了INFO级别:本地 节点相似性分配分布不理想[cache = default, ExpectedPrimary = 1024.00,actualPrimary = 1024,expectedBackups = 1024.00, actualBackups = 0,warningThreshold = 50.00%] [2019-05-17 22:32:24,850] [WARN] [exchange-worker-#43] [GridAffinityAssignmentCache] 在INFO级别登录而不检查是否启用了INFO级别:本地 节点相似性分配分布不理想[cache = default, ExpectedPrimary = 1024.00,actualPrimary = 1024,expectedBackups = 1024.00, actualBackups = 0,warningThreshold = 50.00%] [2019-05-17 22:32:24,911] [WARN ] [disco-notifier-worker-#41] [GridClusterStateProcessor]记录在 INFO级别,不检查是否启用了INFO级别:接收状态 更改完成消息:true 22:33:49.086 [exchange-worker-#43]信息 c.b.aa.ceres.loader.S3CacheLoader-加载 eb5445c7-d7fa-4018-95b6-63c4a0911eae收到注入点燃实例 IgniteKernal [longJVMPauseDetector =长JVMPauseDetector [workerRef = Thread [jvm-pause-detector-worker,5,main],longPausesCnt = 0, longPausesTotalDuration = 0,longPausesTimestamps = [0,0,0,0,0,0,0, 0、0、0、0、0、0、0、0、0、0、0、0、0],longPausesDurations = [0、0、0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]], cfg = IgniteConfiguration [igniteInstanceName = null,pubPoolSize = 32, svcPoolSize = 32,callbackPoolSize = 8,strandedPoolSize = 8,sysPoolSize = 16, mgmtPoolSize = 4,igfsPoolSize = 8,dataStreamerPoolSize = 8, utilityCachePoolSize = 8,utilityCacheKeepAliveTime = 60000, p2pPoolSize = 2,qryPoolSize = 8,igniteHome = / opt / ignite / apache-ignite, igniteWorkDir = / data / ignite / work, mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6f94fa3e, nodeId = f25228c0-afbc-4626-990a-68f97fd5b258,marsh = BinaryMarshaller [],marshLocJobs = false,daemon = false,p2pEnabled = true, netTimeout = 5000,sndRetryDelay = 1000,sndRetryCnt = 3, metricsHistSize = 10000,metricsUpdateFreq = 2000, metricsExpTime = 9223372036854775807,discoSpi = TcpDiscoverySpi [addrRslvr = null,sockTimeout = 5000,ackTimeout = 5000, 沼泽=杰克·马歇尔 [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@44a3f602], reconCnt = 10,reconDelay = 2000,maxAckTimeout = 600000, forceSrvMode = false,clientReconnectDisabled = false,internalLsnr = null], segPlc = NOOP,segResolveAttempts = 2,waitForSegOnStart = true, allResolversPassReq = true,segChkFreq = 60000, commSpi = TcpCommunicationSpi [connectGate=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$ConnectGateway@6020964a, connPlc=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$FirstConnectionPolicy@3f2874d5, enableForcibleNodeKill = false,enableTroubleshootingLog = true, locAddr = null,locHost = 0.0.0.0 / 0.0.0.0,locPort = 47100, locPortRange = 100,shmemPort = -1,directBuf = true,directSndBuf = false, idleConnTimeout = 600000,connTimeout = 5000,maxConnTimeout = 600000, reconCnt = 10,sockSndBuf = 32768,sockRcvBuf = 32768,msgQueueLimit = 1024, slowClientQueueLimit = 0,nioSrvr = GridNioServer [selectorSpins = 0, filterChain = FilterChain [filters = [GridNioCodecFilter [parser=org.apache.ignite.internal.util.nio.GridDirectParser@7873ad1, directMode = true],GridConnectionBytesVerifyFilter],closed = false, directBuf = true,tcpNoDelay = true,sockSndBuf = 32768,sockRcvBuf = 32768, writeTimeout = 2000,idleTimeout = 600000,skipWrite = false, skipRead = false,locAddr = 0.0.0.0 / 0.0.0.0:47100,order = LITTLE_ENDIAN, sndQueueLimit = 1024,directMode = true,sslFilter = null, msgQueueLsnr = null,readerMoveCnt = 0,writerMoveCnt = 0, readWriteSelectorsAssign = false],shmemSrv = null, usePairedConnections = false,connectionsPerNode = 1,tcpNoDelay = true, filterReachableAddresses = false,ackSndThreshold = 32, unackedMsgsBufSize = 0,sockWriteTimeout = 2000,boundTcpPort = 47100, boundTcpShmemPort = -1,selectorsCnt = 4,selectorSpins = 0,addrRslvr = null, ctxInitLatch=java.util.concurrent.CountDownLatch@7b757828 [Count = 0], 停止=假], evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@282cb7c7, colSpi = NoopCollisionSpi [],deploySpi = LocalDeploymentSpi [], indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@50de186c, addrRslvr = null, encryptionSpi=org.apache.ignite.spi.encryption.noop.NoopEncryptionSpi@5a3bc7ed, clientMode = false,rebalanceThreadPoolSize = 1, txCfg = TransactionConfiguration [txSerEnabled = false, dfltIsolation = REPEATABLE_READ,dfltConcurrency = PESSIMISTIC, dfltTxTimeout = 0,txTimeoutOnPartitionMapExchange = 0, pessimisticTxLogSize = 0,pessimisticTxLogLinger = 10000, tmLookupClsName = null,txManagerFactory = null,useJtaSync = false], cacheSanityCheckEnabled = true,discoStartupDelay = 60000, deployMode =共享,p2pMissedCacheSize = 100,locHost = null, timeSrvPortBase = 31100,timeSrvPortRange = 100, failureDetectionTimeout = 60000,sysWorkerBlockedTimeout = null, clientFailureDetectionTimeout = 30000,metricsLogFreq = 60000, hadoopCfg = null,connectorCfg = ConnectorConfiguration [jettyPath = null, host = null,port = 11211,noDelay = true,directBuf = false, sndBufSize = 32768,rcvBufSize = 32768,idleQryCurTimeout = 600000, idleQryCurCheckFreq = 60000,sndQueueLimit = 0,selectorCnt = 4, idleTimeout = 7000,sslEnabled = false,sslClientAuth = false, sslCtxFactory = null,sslFactory = null,portRange = 100,threadPoolSize = 8, msgInterceptor = null],odbcCfg = null,warmupClos = null, atomicCfg = AtomicConfiguration [seqReserveSize = 1000, cacheMode = PARTITIONED,备份= 1,aff = null,grpName = null], classLdr = null,sslCtxFactory = null,platformCfg = null,binaryCfg = null, memCfg = null,pstCfg = null,dsCfg = DataStorageConfiguration [sysRegionInitSize = 41943040,sysRegionMaxSize = 104857600, pageSize = 1024,concLvl = 0,dfltDataRegConf = DataRegionConfiguration [name = Default_Region,maxSize = 134217728,initSize = 67108864, swapPath = null,pageEvictionMode = DISABLED,evictionThreshold = 0.9, emptyPagesPoolSize = 100,metricsEnabled = true, metricsSubIntervalCount = 5,metricsRateTimeInterval = 60000, persistenceEnabled = false,checkpointPageBufSize = 0], dataRegions = [DataRegionConfiguration [name = Tiered_Region, maxSize = 8589934592,initSize = 8589934592,swapPath = null, pageEvictionMode = DISABLED,evictionThreshold = 0.9, emptyPagesPoolSize = 100,metricsEnabled = true, metricsSubIntervalCount = 5,metricsRateTimeInterval = 60000, persistenceEnabled = true,checkpointPageBufSize = 0]], storagePath = / data / ignite / persistentStore,checkpointFreq = 180000, lockWaitTime = 30000,checkpointThreads = 8, checkpointWriteOrder = SEQUENTIAL,walHistSize = 20, maxWalArchiveSize = 1073741824,walSegments = 10,walSegmentSize = 67108864, walPath = db / wal,walArchivePath = db / wal / archive,metricsEnabled = false, walMode = LOG_ONLY,walTlbSize = 131072,walBuffSize = 0,walFlushFreq = 2000, walFsyncDelay = 1000,walRecordIterBuffSize = 67108864, alwaysWriteFullPages = false, fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory@2fb68ec6, metricsSubIntervalCnt = 5,metricsRateTimeInterval = 60000, walAutoArchiveAfterInactivity = -1,writeThrottlingEnabled = true, walCompactionEnabled = false,walCompactionLevel = 1, checkpointReadLockTimeout = null],activeOnStart = true, autoActivation = true,longQryWarnTimeout = 3000,sqlConnCfg = null, cliConnCfg = ClientConnectorConfiguration [主机=空,端口= 10800, portRange = 100,sockSndBufSize = 0,sockRcvBufSize = 0,tcpNoDelay = true, maxOpenCursorsPerConn = 128,threadPoolSize = 8,idleTimeout = 0, jdbcEnabled = true,odbcEnabled = true,thinCliEnabled = true, sslEnabled = false,useIgniteSslCtxFactory = true,sslClientAuth = false, sslCtxFactory = null],mvccVacuumThreadCnt = 2,mvccVacuumFreq = 5000, authEnabled = false,failureHnd = null,commFailureRslvr = null], igniteInstanceName = null,startTime = 1558132126418, rsrcCtx=org.apache.ignite.internal.processors.resource.GridSpringResourceContextImpl@556d0e12, reconnectState = ReconnectState [firstReconnectFut = GridFutureAdapter [ignoreInterrupts = false,state = INIT,res = null,hash = 1426466647], curReconnectFut = null,reconnectDone = null]]
答案 0 :(得分:1)
我想的是Checkpoint Page Buffer,默认情况下为数据区域大小的20%。
您可以明确指定它以确保您不会忘记它,并相应地减小区域大小以确保您不会用完RAM。
应仅适用于持久性区域。
请注意,您还应该期望OS为其数据结构和块缓存占用几GB,因此我不认为您应该为Ignite的Off-Heap分配116G的120G内存。也不要忘记堆。