无法在AWS EMR群集中使用配置单元创建外部表,该位置指向某些S3位置

时间:2019-04-11 15:35:15

标签: amazon-s3 hive amazon-emr external-tables

我正在尝试使用AWS EMR集群的配置单元服务创建外部表。在这里,此外部表指向一些S3位置。下面是我的创建表定义:

EXTERNAL TABLE if not exists Myschema.MyTable ( columnA INT, columnB INT, columnC String, ) partitioned BY ( columnD INT ) STORED AS PARQUET LOCATION 's3://{bucket-locaiton}/{key-path}/';

以下是我得到的异常:

2019-04-11T14:44:59,449 INFO [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: util.PlatformInfo (PlatformInfo.java:getJobFlowId(54)) - Unable to read clusterId from http://localhost:8321/configuration, trying extra instance data file: /var/lib/instance-controller/extraInstanceData.json 2019-04-11T14:44:59,450 INFO [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: util.PlatformInfo (PlatformInfo.java:getJobFlowId(61)) - Unable to read clusterId from /var/lib/instance-controller/extraInstanceData.json, trying EMR job-flow data file: /var/lib/info/job-flow.json 2019-04-11T14:44:59,450 INFO [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: util.PlatformInfo (PlatformInfo.java:getJobFlowId(69)) - Unable to read clusterId from /var/lib/info/job-flow.json, out of places to look 2019-04-11T14:45:01,073 INFO [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: conf.HiveConf (HiveConf.java:getLogIdVar(3956)) - Using the default value passed in for log id: 6a95bad7-18e7-49de-856d-43219b7c5069 2019-04-11T14:45:01,073 INFO [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: session.SessionState (SessionState.java:resetThreadName(432)) - Resetting thread name to main 2019-04-11T14:45:01,072 ERROR [6a95bad7-18e7-49de-856d-43219b7c5069 main([])]: ql.Driver (SessionState.java:printError(1126)) - FAILED: $ComputationException java.lang.ArrayIndexOutOfBoundsException: 16227 com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$ComputationException: java.lang.ArrayIndexOutOfBoundsException: 16227 at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$MapMaker$StrategyImpl.compute(MapMaker.java:553) at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$MapMaker$StrategyImpl.compute(MapMaker.java:419) at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$CustomConcurrentHashMap$ComputingImpl.get(CustomConcurrentHashMap.java:2041) at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$StackTraceElements.forMember(StackTraceElements.java:53) at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.Errors.formatSource(Errors.java:690) at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.Errors.format(Errors.java:555) at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.ProvisionException.getMessage(ProvisionException.java:59) at java.lang.Throwable.getLocalizedMessage(Throwable.java:391) at java.lang.Throwable.toString(Throwable.java:480) at java.lang.Throwable.(Throwable.java:311) at java.lang.Exception.(Exception.java:102) at org.apache.hadoop.hive.ql.metadata.HiveException.(HiveException.java:41) at org.apache.hadoop.hive.ql.parse.SemanticException.(SemanticException.java:41) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.toReadEntity(BaseSemanticAnalyzer.java:1659) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.toReadEntity(BaseSemanticAnalyzer.java:1651) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.toReadEntity(BaseSemanticAnalyzer.java:1647) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:11968) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:11020) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:11133) at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:512) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:239) at org.apache.hadoop.util.RunJar.main(RunJar.java:153) Caused by: java.lang.ArrayIndexOutOfBoundsException: 16227 at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.asm.$ClassReader.readClass(Unknown Source) at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.asm.$ClassReader.accept(Unknown Source) at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.asm.$ClassReader.accept(Unknown Source) at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$LineNumbers.(LineNumbers.java:62) at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$StackTraceElements$1.apply(StackTraceElements.java:36) at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$StackTraceElements$1.apply(StackTraceElements.java:33) at com.amazon.ws.emr.hadoop.fs.shaded.com.google.inject.internal.util.$MapMaker$StrategyImpl.compute(MapMaker.java:549) ... 37 more

注意:当我使用HDFS位置创建同一张表。我可以成功创建它。

3 个答案:

答案 0 :(得分:0)

我不确定确切的问题,但是当我遇到这个问题时,我可以使用新创建的S3存储桶使其工作。 Hive只是不喜欢我的旧水桶。

编辑:实际上,我能够使用现有存储桶来解决此问题。我的EMR配置对fs.s3.maxConnections的规格有误。当我将其设置为有效值并启动新集群时,问题就消失了。

答案 1 :(得分:0)

与用户一起从主节点运行hadoop fs -ls s3://,以查看是否遇到相同的错误

entry-point

检查用户是否具有具有足够的S3 / DynamoDB权限的IAM角色。

答案 2 :(得分:0)

在调试了Hadoop和AWS的代码后,我发现java.lang.ArrayIndexOutOfBoundsException与背后的真实错误无关。

实际上,EMR / Hadoop生成了另一个错误(取决于您的情况),但是在格式化此错误消息时,它触发了另一个异常:java.lang.ArrayIndexOutOfBoundsException。有一个与此相关的问题: https://github.com/google/guice/issues/757

为了找到背后的真正原因,您有一些选择:

  1. 使用命令模拟您的操作并启用调试模式。例如,使用EMRFS从S3读取数据/向S3写入数据时出现错误,因此我改用了命令“ hdfs dfs -ls s3:// xxxxx / xxx”。在此命令之前,我使用以下变量启用了调试模式:export HADOOP_ROOT_LOGGER = DEBUG,console 它可以显示一些有趣的错误

  2. 如果第一个选项仍然不显示任何内容,则可以按照我的方式进行操作: 2.1导出HADOOP_OPTS =“-agentlib:jdwp = transport = dt_socket,server = y,suspend = y,address = 5005” 2.2启动命令“ hdfs dfs -ls s3:// xxxx / xxx”。它将等待远程客户端连接到JVM进行调试(我声明了suspend = y) 2.3使用IDE工具连接到JVM。当然,在此之前,您需要将相关的jar导入或下载到IDE中。

Amazon确实需要通过升级版本来纠正Google Guice库错误。