在Amazon S3中创建由Avro文件支持的Hive表的问题

时间:2014-07-16 00:37:13

标签: hadoop amazon-s3 hive

我一直在尝试在S3中创建由Avro文件支持的Hive表。最初,我认为这可能相对简单,但我遇到了以下错误。

这里是create table命令:

set fs.s3.awsAccessKeyId=ACCESS_KEY_ID;
set fs.s3.awsSecretAccessKey=SECRET_ACCESS_KEY;
use some_database;
CREATE EXTERNAL TABLE experiment_with_s3_backed_data
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
        'avro.schema.literal'='{
        "namespace": "",
        "type": "record",
        "name": "SomeAvroSchema",
        "fields": [
            {"name": "someVariable","type":"string"}
        ]
}')
STORED AS INPUTFORMAT
    'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 's3://MY_BUCKET/some/data/'
;

这是我得到的错误:

AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).

我尝试使用s3s3n网址和参数,结果相同。我注意到了相关问题,建议将密钥添加到core-site.xml,但我无法做到这一点有两个原因:

  1. 由于访问限制,我无法更改Hadoop配置。
  2. 我可能有不同的表对S3具有不同的访问权限,所以我通常有兴趣为用户提供将他们的S3数据加载到Hive表的能力。
  3. 请参阅Problem with copying local data onto HDFS on a Hadoop cluster using Amazon EC2/ S3

1 个答案:

答案 0 :(得分:0)

我通过将密钥直接添加到S3 URL来找出S3键设置的解决方法,如下所示:

s3n://ACCESS_KEY:SECRET_KEY@MY_BUCKET/some/data/'

生成的create table语句如下所示:

use some_database;
CREATE EXTERNAL TABLE experiment_with_s3_backed_data
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
    'avro.schema.literal'='{
        "namespace": "",
        "type": "record",
        "name": "SomeAvroSchema",
        "fields": [
            {"name": "someVariable","type":"string"}
        ]
}')
STORED AS INPUTFORMAT
    'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 's3n://ACCESS_KEY:SECRET_KEY@MY_BUCKET/some/data/'
;