Hive查询抛出异常 - 编译语句时出错:FAILED:ArrayIndexOutOfBoundsException null

时间:2016-09-16 14:16:08

标签: amazon-web-services hadoop mapreduce hive elastic-map-reduce

我刚刚将hive-exec和hive-jdbc的hive版本升级到2.1.0。

但正因为如此,一些查询开始失败,以前工作正常。

例外 -

Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null
    at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)
    at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)
    at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:309)
    at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:250)
    at com.XXX.YYY.executors.HiveQueryExecutor.executeQueriesInternal(HiveQueryExecutor.java:234)
    at com.XXX.YYY.executors.HiveQueryExecutor.executeQueriesMetricsEnabled(HiveQueryExecutor.java:184)
    at com.XXX.YYY.executors.HiveQueryExecutor.main(HiveQueryExecutor.java:500)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null
    at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387)
    at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:186)
    at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:269)
    at org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:460)
    at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:447)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
    at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
    at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
    at com.sun.proxy.$Proxy33.executeStatementAsync(Unknown Source)
    at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:294)
    at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:497)
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
    at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: null

查询我跑了 -

INSERT OVERWRITE TABLE base_performance_order_20160916
SELECT 
*
 FROM 
(
select
coalesce(traffic_feed.sku,commerce_feed.sku) AS sku,
concat(coalesce(traffic_feed.feed_date,commerce_feed.feed_date),' ','00:00:00') AS transaction_date,
commerce_feed.units AS gross_units,
commerce_feed.orders AS gross_orders,
commerce_feed.revenue AS gross_revenue,
NULL AS gross_cost,
NULL AS gross_subsidized_cost,
NULL AS gross_shipping_cost,
NULL AS gross_variable_cost,
NULL AS gross_shipping_charges,
traffic_feed.pageViews AS page_views,
traffic_feed.uniqueVisitors AS unique_visits,
0 AS channel_id,
concat(coalesce(traffic_feed.feed_date,commerce_feed.feed_date),' ','00:00:00') AS feed_date,
from_unixtime(unix_timestamp()) AS creation_date
from traffic_feed
full outer join commerce_feed on coalesce(traffic_feed.sku)=commerce_feed.sku AND coalesce(traffic_feed.feed_date)=commerce_feed.feed_date
) tb
WHERE sku is not NULL and transaction_date is not NULL and channel_id is not NULL and feed_date is not NULL and creation_date is not NULL

我在没有设置任何配置变量的情况下运行此查询时工作正常。

但是当我在Hive配置属性下面设置

"set hivevar:hive.mapjoin.smalltable.filesize=2000000000",
                "set hivevar:mapreduce.map.speculative=false",
                "set hivevar:mapreduce.output.fileoutputformat.compress=true",
                "set hivevar:hive.exec.compress.output=true",
                "set hivevar:mapreduce.task.timeout=6000000",
                "set hivevar:hive.optimize.bucketmapjoin.sortedmerge=true",
                "set hivevar:io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec",
                "set hivevar:hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat",
                "set hivevar:hive.auto.convert.sortmerge.join.noconditionaltask=true",
                "set hivevar:FEED_DATE=20160916",
                "set hivevar:hive.optimize.bucketmapjoin=true",
                "set hivevar:hive.exec.compress.intermediate=true",
                "set hivevar:hive.enforce.bucketmapjoin=true",
                "set hivevar:mapred.output.compress=true",
                "set hivevar:mapreduce.map.output.compress=true",
                "set hivevar:hive.auto.convert.sortmerge.join=true",
                "set hivevar:hive.auto.convert.join=false",
                "set hivevar:mapreduce.reduce.speculative=false",
                "set hivevar:PD_KEY=vijay-test-mail@XXXcommerce.pagerduty.com",
                "set hivevar:mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec",
                "set hive.mapjoin.smalltable.filesize=2000000000",
                "set mapreduce.map.speculative=false",
                "set mapreduce.output.fileoutputformat.compress=true",
                "set hive.exec.compress.output=true",
                "set mapreduce.task.timeout=6000000",
                "set hive.optimize.bucketmapjoin.sortedmerge=true",
                "set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec",
                "set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat",
                "set hive.auto.convert.sortmerge.join.noconditionaltask=true",
                "set FEED_DATE=20160916",
                "set hive.optimize.bucketmapjoin=true",
                "set hive.exec.compress.intermediate=true",
                "set hive.enforce.bucketmapjoin=true",
                "set mapred.output.compress=true",
                "set mapreduce.map.output.compress=true",
                "set hive.auto.convert.sortmerge.join=true",
                "set hive.auto.convert.join=false",
                "set mapreduce.reduce.speculative=false",
                "set PD_KEY=vijay-test-mail@XXXcommerce.pagerduty.com",
                "set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"

上述例外情况开始失败。

问题 -

  1. 我设置的Hive配置属性是否会产生问题(我只升级了hive和hadoop版本)?

1 个答案:

答案 0 :(得分:1)

尝试禁用排序合并连接属性,这是一个临时解决方案。

由于您已将sort merge join属性设置为true,因此默认情况下会将io.sort.mb视为2047 MB​​,这可能会导致Arrayindexoutofbound异常。因此,当您设置排序合并连接属性时,建议根据查询中使用的数据集大小,将sort.io.mb属性设置为最佳值。

要知道查询需要多少数据大小,您可以解释查询:      说明 它显示了每个子查询和阶段中要考虑的数据量。

希望这有帮助。