Hive不会写入aws s3

时间:2015-07-06 16:45:55

标签: hadoop amazon-web-services amazon-s3 hive

我在hive中有一个外部表存储在我的hadoop集群中,我想将其内容移动到存储在Amazon s3上的外部表中。

所以我创建了一个s3支持的表,如下所示:

CREATE EXTERNAL TABLE IF NOT EXISTS export.export_table 
like table_to_be_exported 
ROW FORMAT SERDE ...  
with SERDEPROPERTIES ('fieldDelimiter'='|')  
STORED AS TEXTFILE 
LOCATION 's3a://bucket/folder';

然后我跑:INSERT INTO export.export_table SELECT * FROM table_to_be_exported

输出以下内容

INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
WARN  : Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
INFO  : Starting Job = job_1435176004514_0028, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1435176004514_0028/
INFO  : Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1435176004514_0028
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO  : 2015-07-06 09:22:18,379 Stage-1 map = 0%,  reduce = 0%
INFO  : 2015-07-06 09:22:27,795 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.9 sec
INFO  : MapReduce Total cumulative CPU time: 2 seconds 900 msec
INFO  : Ended Job = job_1435176004514_0028
INFO  : Stage-4 is selected by condition resolver.
INFO  : Stage-3 is filtered out by condition resolver.
INFO  : Stage-5 is filtered out by condition resolver.
INFO  : Moving data to: s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10000 from s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10002
ERROR : Failed with exception Wrong FS: s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10002, expected: hdfs://quickstart.cloudera:8020
java.lang.IllegalArgumentException: Wrong FS: s3a://bucket/folder/.hive-staging_hive_2015-07-06_09-22-10_351_9216807769834089982-3/-ext-10002, expected: hdfs://quickstart.cloudera:8020
  at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
  at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:193)
  at org.apache.hadoop.hdfs.DistributedFileSystem.getEZForPath(DistributedFileSystem.java:1916)
  at org.apache.hadoop.hdfs.client.HdfsAdmin.getEncryptionZoneForPath(HdfsAdmin.java:262)
  at org.apache.hadoop.hive.shims.Hadoop23Shims$HdfsEncryptionShim.isPathEncrypted(Hadoop23Shims.java:1187)
  at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2449)
  at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:105)
  at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:222)
  at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
  at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
  at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1638)
  at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1397)
  at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1181)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1047)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1042)
  at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:145)
  at org.apache.hive.service.cli.operation.SQLOperation.access$100(SQLOperation.java:70)
  at org.apache.hive.service.cli.operation.SQLOperation$1$1.run(SQLOperation.java:197)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
  at org.apache.hive.service.cli.operation.SQLOperation$1.run(SQLOperation.java:209)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
  at java.util.concurrent.FutureTask.run(FutureTask.java:262)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask (state=08S01,code=1)

我在我的hadoop core-site.xml中设置了s3a密钥和秘密,并且能够直接使用hadoop {s}来执行s3的读写操作。

有什么猜测我能做些什么才能让它发挥作用?

1 个答案:

答案 0 :(得分:0)

尝试使用s3代替s3a,我的猜测是在EMR的Hive发行版中尚未支持s3a。