How do I debug a failing cloudera-scm-server process?

时间:2015-06-15 14:17:32

标签: cloudera-manager

I am trying to install Cloudera Manager 5 on centOS6, but the cloudera-scm-server process keeps failing without a clear error in the logs.

service --status-all

cloudera-scm-agent (pid  7058) is running...
cloudera-scm-server dead but pid file exists
pg_ctl: server is running (PID: 13650)
/usr/bin/postgres "-D" "/var/lib/cloudera-scm-server-db/data"

cat /var/log/cloudera-scm-server/cloudera-scm-server.out

JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
Killed (core dumped)

`cat /var/log/cloudera-scm-server/cloudera-scm-server.log

...
2015-06-15 13:54:23,642 INFO main:org.springframework.context.annotation.AnnotationConfigApplicationContext: Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@6424e9d8: startup date [Mon Jun 15 13:54:23 UTC 2015]; root of context hierarchy
2015-06-15 13:54:23,682 INFO main:org.springframework.beans.factory.support.DefaultListableBeanFactory: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@3738baec: defining beans [org.springframework.context.annotation.internalConfigurationAnnotationProcessor,org.springframework.context.annotation.internalAutowiredAnnotationProcessor,org.springframework.context.annotation.internalRequiredAnnotationProcessor,org.springframework.context.annotation.internalCommonAnnotationProcessor,defaultValidatorConfiguration,messageInterpolator,validServiceDependencyValidator,uniqueServiceTypeValidator,uniqueRoleTypeValidator,existingServiceTypeValidator,existingRoleTypeValidator,expressionValidator,autoConfigSharesValidValidator,sdlParser,mdlParser,parcelParser,alternativesParser,permissionsParser,manifestParser,stringInterpolator,serviceDescriptorValidatorWithoutDependencyCheck,serviceDescriptorValidatorWithDependencyCheck,referenceValidator,serviceMonitoringDefinitionsDescriptorValidator,descriptorVisitor,parcelDescriptorValidator,alternativesDescriptorValidator,permissionsDescriptorValidator,manifestDescriptorValidator,springConstraintValidatorFactory,validatorFactoryBean,metricNameFormatValidator,nameForCrossEntityAggregateFormatValidator,builtInServiceTypes,builtInRoleTypes,builtInNamesForCrossEntityAggregateMetrics,uniqueFieldValidator]; root of factory hierarchy
2015-06-15 13:54:48,589 INFO main:com.cloudera.csd.components.MdlRegistry: Loaded /mdls/cdh5/oozie.mdl
2015-06-15 13:54:48,627 INFO main:com.cloudera.cmf.rules.RulesEngine: Loading rules knowledge base

The end of the log is not 100% consistent, but in general I would say this is the spot after which it regularly fails. On an OutOfMemoryError the application would get killed like it does, but I would expect in that case to find an indication of the error in the logs. Also the heap ought to get dumped, but I fail to find the heap dump, there is no *.hprof file anywhere on the machine. Since the cloudera-scm-server.out log say something about a core-dump, but I don't find that either, where would I look for that?

The server DB is the embedded one, and is running properly. The only error message that looks suspicious to me in the logs is that the relation 'cm_version' does not exist.

1 个答案:

答案 0 :(得分:0)

问题与内存有关:不是堆空间耗尽,而是实际的物理内存。我的VM默认为512 MB内存,并且JVM配置为具有2 GB的堆空间 - 填满物理内存导致操作系统以静默方式终止进程,因此没有有用的日志条目。解决方案是增加VM的内存。