我正在运行DSE 3.2.4并启用了分析功能。我试图将我的一张桌子卸载到S3进行长期存储。我在hive中创建了下表:
CREATE EXTERNAL TABLE events_archive (
event_id string,
time string,
type string,
source string,
value string
)
PARTITIONED BY (year string, month string, day string, hour string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3n://com.mydomain.events/';
然后我尝试使用此查询将一些示例数据加载到其中:
CREATE TEMPORARY FUNCTION c_to_string AS 'org.apache.hadoop.hive.cassandra.ql.udf.UDFCassandraBinaryToString';
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.dynamic.partition=true;
INSERT OVERWRITE TABLE events_archive
PARTITION (year, month, day, hour)
SELECT c_to_string(column4, 'uuid') AS event_id,
from_unixtime(CAST(column3/1000 AS int)) AS time,
CASE column1
WHEN 'pageviews-push' THEN 'page_view'
WHEN 'score_realtime-internal' THEN 'realtime_score'
ELSE 'social_data'
END AS type,
CASE column1
WHEN 'pageviews-push' THEN 'internal'
WHEN 'score_realtime-internal' THEN 'internal'
ELSE split(column1, '-')[0]
END AS source,
value,
year(from_unixtime(CAST(column3/1000 AS int))) AS year,
month(from_unixtime(CAST(column3/1000 AS int))) AS month,
day(from_unixtime(CAST(column3/1000 AS int))) AS day,
hour(from_unixtime(CAST(column3/1000 AS int))) AS hour,
c_to_string(key2, 'blob') AS content_id
FROM events
WHERE column2 = 'data'
AND value IS NOT NULL
AND value != ''
LIMIT 10;
我最终得到了这个例外:
2014-02-11 20:23:55,810 ERROR ql.Driver (SessionState.java:printError(400)) - FAILED: Hive Internal Error: org.apache.hadoop.fs.s3. S3Exception(org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10.226.118.113< /BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC</HostId></Error> )
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: < ?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10. 226.118.113</BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC< /HostId></Error>
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:156)
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.retrieveINode(Jets3tFileSystemStore.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy14.retrieveINode(Unknown Source)
at org.apache.hadoop.fs.s3.S3FileSystem.mkdir(S3FileSystem.java:148)
at org.apache.hadoop.fs.s3.S3FileSystem.mkdirs(S3FileSystem.java:141)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1126)
at org.apache.hadoop.hive.ql.Context.getScratchDir(Context.java:165)
at org.apache.hadoop.hive.ql.Context.getExternalScratchDir(Context.java:222)
at org.apache.hadoop.hive.ql.Context.getExternalTmpFileURI(Context.java:315)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:4049)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:6205)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:6136)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6762)
at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7531)
at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for '/10.226.118.113/%2F' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>10.226.118.113< /BucketName><RequestId>FFFFBCE9711A91AE</RequestId><HostId>kXu2oMblsYKD+Jx9O5fTbjosOtTNNtyM+lbE2pmCC63Wm3abJxMvanHdSCYnUyaC</HostId></Error>
at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRequest(RestS3Service.java:416)
at org.jets3t.service.impl.rest.httpclient.RestS3Service.performRestGet(RestS3Service.java:752)
at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1601)
at org.jets3t.service.impl.rest.httpclient.RestS3Service.getObjectImpl(RestS3Service.java:1544)
at org.jets3t.service.S3Service.getObject(S3Service.java:2072)
at org.jets3t.service.S3Service.getObject(S3Service.java:1310)
at org.apache.hadoop.fs.s3.Jets3tFileSystemStore.get(Jets3tFileSystemStore.java:144)
... 33 more
最新的DSE是否支持Hive S3连接器?或者我可能做错了什么?
答案 0 :(得分:3)
在您的配置单元安装中尝试以下操作:
<强>蜂房-site.xml中强>
<property>
<name>fs.default.name</name>
<value>s3n://your-bucket</value>
</property>
<强>芯-site.xml中强>
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>Your AWS Key</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>Your AWS Secret Key</value>
</property>
这是3.1文档:http://www.datastax.com/docs/datastax_enterprise3.1/solutions/about_hive。
下:
在Hive中使用外部文件系统
在3.2文档中没有看到它。不确定为什么他们如果这样做就省略了它,但看起来像是在S3上运行Hive必不可少的东西
答案 1 :(得分:0)
S3文件系统的Hadoop实现已过时,因此从hive向S3写入数据效果不佳。我们通过阅读来解决问题。现在DSE可以读取S3文件,但写入有问题。我们将检查它是否可以尽快解决它