Hadoop 2.4无法在aws s3n上启动作业

时间:2014-08-12 22:34:22

标签: java hadoop amazon-s3

我使用S3本机文件系统在AWS EC2上部署了Hadoop 2.4,以替代HDFS。我尝试了几个示例应用程序,所有这些都给了我以下堆栈跟踪消息(7月24日的一个较旧的线程挂在那里没有被解析...所以我在这里附上了DEBUG信息......):

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.0.jar wordcount s3n://mybkt/wc/ s3n://mybkt/out

14/08/12 21:57:35 DEBUG util.Shell: setsid exited with exit code 0
14/08/12 21:57:36 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops)
14/08/12 21:57:36 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)], about=, type=DEFAULT, always=false, sampleName=Ops)
14/08/12 21:57:36 DEBUG lib.MutableMetricsFactory: field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(valueName=Time, value=[GetGroups], about=, type=DEFAULT, always=false, sampleName=Ops)
14/08/12 21:57:36 DEBUG impl.MetricsSystemImpl: UgiMetrics, User and group related metrics
14/08/12 21:57:36 DEBUG util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty
14/08/12 21:57:36 DEBUG security.Groups:  Creating new Groups object
14/08/12 21:57:36 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
14/08/12 21:57:36 DEBUG util.NativeCodeLoader: Failed to load native-hadoop with error: java.lang.UnsatisfiedLinkError: no hadoop in java.library.path
14/08/12 21:57:36 DEBUG util.NativeCodeLoader: java.library.path=/home/ubuntu/hadoop-2.4.0/lib
14/08/12 21:57:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/08/12 21:57:36 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Falling back to shell based
14/08/12 21:57:36 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping
14/08/12 21:57:36 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
14/08/12 21:57:36 DEBUG security.UserGroupInformation: hadoop login
14/08/12 21:57:36 DEBUG security.UserGroupInformation: hadoop login commit
14/08/12 21:57:36 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: ubuntu
14/08/12 21:57:36 DEBUG security.UserGroupInformation: UGI loginUser:ubuntu (auth:SIMPLE)
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.https-only=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: storage-service.internal-error-retry-max=5
14/08/12 21:57:36 DEBUG service.Jets3tProperties: http.connection-manager.factory-class-name=org.jets3t.service.utils.RestUtils$ConnManagerFactory
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.connection-timeout-ms=60000
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.socket-timeout-ms=60000
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.stale-checking-enabled=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.useragent=null
14/08/12 21:57:36 DEBUG utils.RestUtils: Setting user agent string: JetS3t/0.9.0 (Linux/3.13.0-29-generic; amd64; en; JVM 1.7.0_55)
14/08/12 21:57:36 DEBUG service.Jets3tProperties: http.protocol.expect-continue=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.connection-manager-timeout=0
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.retry-max=5
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.proxy-autodetect=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.s3-endpoint=s3.amazonaws.com
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: About to attempt auto proxy detection under Java version:1.7.0_55-b14
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Sun Plugin reported java version not 1.3.X, 1.4.X, 1.5.X or 1.6.X - trying failover detection...
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Using failover proxy detection...
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Plugin Proxy Config List Property:null
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: No configured plugin proxy list
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.default-storage-class=null
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.server-side-encryption=null
14/08/12 21:57:36 DEBUG service.Jets3tProperties: http.connection-manager.factory-class-name=org.jets3t.service.utils.RestUtils$ConnManagerFactory
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.connection-timeout-ms=60000
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.socket-timeout-ms=60000
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.stale-checking-enabled=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.useragent=null
14/08/12 21:57:36 DEBUG utils.RestUtils: Setting user agent string: JetS3t/0.9.0 (Linux/3.13.0-29-generic; amd64; en; JVM 1.7.0_55)
14/08/12 21:57:36 DEBUG service.Jets3tProperties: http.protocol.expect-continue=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.connection-manager-timeout=0
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.retry-max=5
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.proxy-autodetect=true
14/08/12 21:57:36 DEBUG service.Jets3tProperties: s3service.s3-endpoint=s3.amazonaws.com
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: About to attempt auto proxy detection under Java version:1.7.0_55-b14
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Sun Plugin reported java version not 1.3.X, 1.4.X, 1.5.X or 1.6.X - trying failover detection...
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Using failover proxy detection...
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: Plugin Proxy Config List Property:null
14/08/12 21:57:36 DEBUG proxy.PluginProxyUtil: No configured plugin proxy list
14/08/12 21:57:36 DEBUG service.Jets3tProperties: devpay.user-token=null
14/08/12 21:57:36 DEBUG service.Jets3tProperties: devpay.product-token=null
14/08/12 21:57:36 DEBUG service.Jets3tProperties: httpclient.requester-pays-buckets-enabled=false
14/08/12 21:57:36 DEBUG security.UserGroupInformation: PrivilegedAction as:ubuntu (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
14/08/12 21:57:36 DEBUG mapreduce.Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.YarnClientProtocolProvider
14/08/12 21:57:36 DEBUG service.AbstractService: Service: org.apache.hadoop.mapred.ResourceMgrDelegate entered state INITED
14/08/12 21:57:36 DEBUG service.AbstractService: Service: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl entered state INITED
14/08/12 21:57:37 INFO client.RMProxy: Connecting to ResourceManager at /172.31.20.187:8032
14/08/12 21:57:37 DEBUG security.UserGroupInformation: PrivilegedAction as:ubuntu (auth:SIMPLE) from:org.apache.hadoop.yarn.client.RMProxy.getProxy(RMProxy.java:130)
14/08/12 21:57:37 DEBUG ipc.YarnRPC: Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
14/08/12 21:57:37 DEBUG ipc.HadoopYarnProtoRPC: Creating a HadoopYarnProtoRpc proxy for protocol interface org.apache.hadoop.yarn.api.ApplicationClientProtocol
14/08/12 21:57:37 DEBUG ipc.Server: rpcKind=RPC_PROTOCOL_BUFFER, rpcRequestWrapperClass=class org.apache.hadoop.ipc.ProtobufRpcEngine$RpcRequestWrapper, rpcInvoker=org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker@7d66036e
14/08/12 21:57:37 DEBUG ipc.Client: getting client out of cache: org.apache.hadoop.ipc.Client@71cebfd2
14/08/12 21:57:37 DEBUG service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.YarnClientImpl is started
14/08/12 21:57:37 DEBUG service.AbstractService: Service org.apache.hadoop.mapred.ResourceMgrDelegate is started
14/08/12 21:57:37 DEBUG security.UserGroupInformation: PrivilegedAction as:ubuntu (auth:SIMPLE) from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330)
14/08/12 21:57:37 DEBUG security.UserGroupInformation: PrivilegedActionException as:ubuntu (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: s3n
14/08/12 21:57:37 INFO mapreduce.Cluster: Failed to use org.apache.hadoop.mapred.YarnClientProtocolProvider due to error: Error in instantiating YarnClient
14/08/12 21:57:37 DEBUG mapreduce.Cluster: Trying ClientProtocolProvider : org.apache.hadoop.mapred.LocalClientProtocolProvider
14/08/12 21:57:37 DEBUG mapreduce.Cluster: Cannot pick org.apache.hadoop.mapred.LocalClientProtocolProvider as the ClientProtocolProvider - returned null protocol
14/08/12 21:57:37 DEBUG security.UserGroupInformation: PrivilegedActionException as:ubuntu (auth:SIMPLE) cause:java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
    at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1255)
    at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1251)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1250)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1279)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
    at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

以下是我的配置文件:

纱-site.xml中:

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>172.31.20.187:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>172.31.20.187:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>172.31.20.187:8030</value>
    </property>
    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/home/ubuntu/hdfs/tmp</value>
    </property>

mapred-site.xml中:

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

    <property>
        <name>mapreduce.map.memory.mb</name>
        <value>640</value>
        <description>Larger resource limit for maps.</description>
    </property>

    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx768m</value>
        <description>Heap-size for child jvms of maps.</description>
    </property>

    <property>
        <name>mapreduce.reduce.memory.mb</name>
        <value>640</value>
        <description>Larger resource limit for reduces.</description>
    </property>

    <property>
        <name>mapreduce.reduce.java.opts</name>
        <value>-Xmx768m</value>
        <description>Heap-size for child jvms of reduces.</description>
    </property>

    <property>
        <name>mapreduce.jobtracker.address</name>
        <value>172.31.20.187:8021</value>
    </property>

我也按照此链接配置了AWS S3的访问控制(core-site.xml): https://wiki.apache.org/hadoop/AmazonS3

核心-site.xml中:

    <property>
        <name>fs.defaultFS</name>
        <value>s3n://mybkt</value>
    </property>

    <property>
        <name>io.file.buffer.size</name>
        <value>131072</value>
    </property>

    <property>
        <name>fs.s3n.awsAccessKeyId</name>
        <value>123</value>
    </property>

    <property>
        <name>fs.s3n.awsSecretAccessKey</name>
        <value>456</value>
    </property>

我也尝试了Hadoop v1,并在s3n文件系统上运行了Hadoop1。但似乎它对Hadoop v2不起作用。

请帮忙。提前谢谢。

1 个答案:

答案 0 :(得分:0)

认为使用s3或任何其他文件系统实现来替代HDFS / Namenode并不是那么简单。 与Tachyon文件系统一样尝试但失败了,请参阅https://groups.google.com/forum/#!topic/tachyon-users/u4OoBekGigA

认为其他人可以通过以下方式开展工作:

  • 添加AbstractFileSystem实现
  • 修补Hadoop以不检查分段工件的权限

所以底线,您可以使用s3来读取输入并写入作业的输出,但是使用它作为执行元数据层的HDFS替代品并不是开箱即用的支持!