从多个存储帐户查询Azure存储分析日志时出错

时间:2015-01-14 10:26:36

标签: azure hadoop hive hdinsight

我有多个Azure存储帐户,我正在尝试使用HDInsight查询存储分析日志。我想在所有存储帐户中使用单个查询,因此我创建了一个外部Hive表并为每个存储帐户添加了一个分区:

ADD JAR wasb:///HdiSamples/StorageAnalytics/hive-serde-microsoft-wa-0.13.0.jar;
CREATE EXTERNAL TABLE IF NOT EXISTS AggregateStorageLogs3 ( 
    VersionNumber string, 
    RequestStartTime string, 
    OperationType string, 
    RequestStatus string, 
    HttpStatusCode string, 
    EndToEndLatencyInMs bigint, 
    ServerLatencyInMs bigint, 
    AuthenticationType string, 
    RequesterAccountName string, 
    OwnerAccountName string, 
    ServiceType string, 
    RequestUrl string, 
    RequestedObjectKey string, 
    RequestIdHeader string, 
    OperationCount bigint, 
    RequesterIpAddress string, 
    RequestVersionHeader string, 
    RequestHeaderSize bigint, 
    RequestPacketSize bigint, 
    ResponseHeaderSize bigint,  
    ResponsePacketSize bigint, 
    RequestContentLength bigint,  
    RequestMD5 string, 
    ServerMD5 string, 
    ETagIdentifier string, 
    LastModifiedTime string, 
    ConditionsUsed string, 
    UserAgentHeader string, 
    ReferrerHeader string, 
    ClientRequestId string) 
COMMENT 'aggregated storage analytics log data' 
PARTITIONED BY (StorageAccount string) 
ROW FORMAT SERDE 'com.microsoft.hadoop.hive.serde2.windowsazure.StorageAnalyticsLogSerDe';

ALTER TABLE AggregateStorageLogs3 ADD IF NOT EXISTS PARTITION(StorageAccount = 'mystorageacc1')
LOCATION 'wasb://$logs@mystorageacc1.blob.core.windows.net/blob/';

ALTER TABLE AggregateStorageLogs3 ADD IF NOT EXISTS PARTITION(StorageAccount = 'mystorageacc2')
LOCATION 'wasb://$logs@mystorageacc2.blob.core.windows.net/blob/';

ALTER TABLE AggregateStorageLogs3 ADD IF NOT EXISTS PARTITION(StorageAccount = 'mystorageacc3')
LOCATION 'wasb://$logs@mystorageacc3.blob.core.windows.net/blob/';

然后我尝试计算外部表中的记录,以查找所有存储帐户中的日志条目总数,如下所示:

SET hive.mapred.supports.subdirectories= true; 
SET mapred.input.dir.recursive=true; 
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode=nonstrict;
ADD JAR wasb:///HdiSamples/StorageAnalytics/hive-serde-microsoft-wa-0.13.0.jar;

SELECT COUNT(*)
FROM AggregateStorageLogs3  

但是,几小时后查询失败并显示大量堆栈跟踪。我是Hadoop的新手,因此堆栈跟踪对我来说毫无意义。它包含一个Log4J错误,但我不确定这是否是整个作业失败的原因,在任何情况下,我都不知道导致该错误的原因。我猜我在数据加载方面做错了,因为当我在没有分区的单个存储帐户上执行相同的查询时,它会起作用。

请有人告诉我我做错了吗?

这是堆栈跟踪:

Logging initialized using configuration in file:/C:/apps/dist/hive-0.13.0.2.1.9.0-2196/conf/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/apps/dist/hadoop-2.4.0.2.1.9.0-2196/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/apps/dist/hbase-0.98.0.2.1.9.0-2196-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
converting to local wasb:///HdiSamples/StorageAnalytics/hive-serde-microsoft-wa-0.13.0.jar
Added D:\Users\hdp\AppData\Local\Temp\00f60c60-6a8e-4de8-87c1-92ba2a402fa6_resources\hive-serde-microsoft-wa-0.13.0.jar to class path
Added resource: D:\Users\hdp\AppData\Local\Temp\00f60c60-6a8e-4de8-87c1-92ba2a402fa6_resources\hive-serde-microsoft-wa-0.13.0.jar
Query ID = hdp_20150113225858_c1a01e81-7e6b-4153-a7b5-5c2f6266aca7
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
log4j:ERROR Failed to rename [C:\apps\dist\hive-0.13.0.2.1.9.0-2196\logs/hive.log] to [C:\apps\dist\hive-0.13.0.2.1.9.0-2196\logs/hive.log.2015-01-13].
org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
    at org.apache.hadoop.fs.azurenative.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:1899)
    at org.apache.hadoop.fs.azurenative.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:1568)
    at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:57)
    at org.apache.hadoop.fs.Globber.glob(Globber.java:248)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1642)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:291)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:263)
    at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:217)
    at org.apache.hadoop.mapred.lib.CombineFileInputFormat.getSplits(CombineFileInputFormat.java:75)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:344)
    at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getSplits(HadoopShimsSecure.java:310)
    at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:435)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
    at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
    at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:136)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1508)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1275)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1093)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359)
    at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:456)
    at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:466)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:749)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
    at com.microsoft.windowsazure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113)
    at org.apache.hadoop.fs.azurenative.StorageInterfaceImpl$WrappingIterator.hasNext(StorageInterfaceImpl.java:86)
    at org.apache.hadoop.fs.azurenative.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:1874)
    ... 50 more
Caused by: com.microsoft.windowsazure.storage.StorageException: The server encountered an unknown failure: OK
    at com.microsoft.windowsazure.storage.StorageException.translateException(StorageException.java:179)
    at com.microsoft.windowsazure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:277)
    at com.microsoft.windowsazure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109)
    ... 52 more
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,1146584]
Message: Connection reset
    at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:596)
    at com.microsoft.windowsazure.storage.core.DeserializationHelper.readElementFromXMLReader(DeserializationHelper.java:152)
    at com.microsoft.windowsazure.storage.core.DeserializationHelper.readElementFromXMLReader(DeserializationHelper.java:129)
    at com.microsoft.windowsazure.storage.blob.BlobDeserializer.readBlobProperties(BlobDeserializer.java:375)
    at com.microsoft.windowsazure.storage.blob.BlobDeserializer.readBlob(BlobDeserializer.java:200)
    at com.microsoft.windowsazure.storage.blob.BlobDeserializer.readBlobItems(BlobDeserializer.java:140)
    at com.microsoft.windowsazure.storage.blob.BlobDeserializer.getBlobList(BlobDeserializer.java:87)
    at com.microsoft.windowsazure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1236)
    at com.microsoft.windowsazure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1200)
    at com.microsoft.windowsazure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:200)
    ... 53 more
Job Submission failed with exception 'org.apache.hadoop.fs.azure.AzureException(java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

0 个答案:

没有答案