我一直在尝试在S3中创建由Avro文件支持的Hive表。最初,我认为这可能相对简单,但我遇到了以下错误。
这里是create table命令:
set fs.s3.awsAccessKeyId=ACCESS_KEY_ID;
set fs.s3.awsSecretAccessKey=SECRET_ACCESS_KEY;
use some_database;
CREATE EXTERNAL TABLE experiment_with_s3_backed_data
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
'avro.schema.literal'='{
"namespace": "",
"type": "record",
"name": "SomeAvroSchema",
"fields": [
{"name": "someVariable","type":"string"}
]
}')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 's3://MY_BUCKET/some/data/'
;
这是我得到的错误:
AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).
我尝试使用s3
和s3n
网址和参数,结果相同。我注意到了相关问题,建议将密钥添加到core-site.xml
,但我无法做到这一点有两个原因:
请参阅Problem with copying local data onto HDFS on a Hadoop cluster using Amazon EC2/ S3
答案 0 :(得分:0)
我通过将密钥直接添加到S3 URL来找出S3键设置的解决方法,如下所示:
s3n://ACCESS_KEY:SECRET_KEY@MY_BUCKET/some/data/'
生成的create table语句如下所示:
use some_database;
CREATE EXTERNAL TABLE experiment_with_s3_backed_data
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES (
'avro.schema.literal'='{
"namespace": "",
"type": "record",
"name": "SomeAvroSchema",
"fields": [
{"name": "someVariable","type":"string"}
]
}')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 's3n://ACCESS_KEY:SECRET_KEY@MY_BUCKET/some/data/'
;