我刚刚将hive-exec和hive-jdbc的hive版本升级到2.1.0。
但正因为如此,一些查询开始失败,以前工作正常。
例外 -
Exception in thread "main" org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)
at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:309)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:250)
at com.XXX.YYY.executors.HiveQueryExecutor.executeQueriesInternal(HiveQueryExecutor.java:234)
at com.XXX.YYY.executors.HiveQueryExecutor.executeQueriesMetricsEnabled(HiveQueryExecutor.java:184)
at com.XXX.YYY.executors.HiveQueryExecutor.main(HiveQueryExecutor.java:500)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ArrayIndexOutOfBoundsException null
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:387)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:186)
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:269)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:460)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:447)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy33.executeStatementAsync(Unknown Source)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:294)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:497)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: null
查询我跑了 -
INSERT OVERWRITE TABLE base_performance_order_20160916
SELECT
*
FROM
(
select
coalesce(traffic_feed.sku,commerce_feed.sku) AS sku,
concat(coalesce(traffic_feed.feed_date,commerce_feed.feed_date),' ','00:00:00') AS transaction_date,
commerce_feed.units AS gross_units,
commerce_feed.orders AS gross_orders,
commerce_feed.revenue AS gross_revenue,
NULL AS gross_cost,
NULL AS gross_subsidized_cost,
NULL AS gross_shipping_cost,
NULL AS gross_variable_cost,
NULL AS gross_shipping_charges,
traffic_feed.pageViews AS page_views,
traffic_feed.uniqueVisitors AS unique_visits,
0 AS channel_id,
concat(coalesce(traffic_feed.feed_date,commerce_feed.feed_date),' ','00:00:00') AS feed_date,
from_unixtime(unix_timestamp()) AS creation_date
from traffic_feed
full outer join commerce_feed on coalesce(traffic_feed.sku)=commerce_feed.sku AND coalesce(traffic_feed.feed_date)=commerce_feed.feed_date
) tb
WHERE sku is not NULL and transaction_date is not NULL and channel_id is not NULL and feed_date is not NULL and creation_date is not NULL
我在没有设置任何配置变量的情况下运行此查询时工作正常。
但是当我在Hive配置属性下面设置
时"set hivevar:hive.mapjoin.smalltable.filesize=2000000000",
"set hivevar:mapreduce.map.speculative=false",
"set hivevar:mapreduce.output.fileoutputformat.compress=true",
"set hivevar:hive.exec.compress.output=true",
"set hivevar:mapreduce.task.timeout=6000000",
"set hivevar:hive.optimize.bucketmapjoin.sortedmerge=true",
"set hivevar:io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec",
"set hivevar:hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat",
"set hivevar:hive.auto.convert.sortmerge.join.noconditionaltask=true",
"set hivevar:FEED_DATE=20160916",
"set hivevar:hive.optimize.bucketmapjoin=true",
"set hivevar:hive.exec.compress.intermediate=true",
"set hivevar:hive.enforce.bucketmapjoin=true",
"set hivevar:mapred.output.compress=true",
"set hivevar:mapreduce.map.output.compress=true",
"set hivevar:hive.auto.convert.sortmerge.join=true",
"set hivevar:hive.auto.convert.join=false",
"set hivevar:mapreduce.reduce.speculative=false",
"set hivevar:PD_KEY=vijay-test-mail@XXXcommerce.pagerduty.com",
"set hivevar:mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec",
"set hive.mapjoin.smalltable.filesize=2000000000",
"set mapreduce.map.speculative=false",
"set mapreduce.output.fileoutputformat.compress=true",
"set hive.exec.compress.output=true",
"set mapreduce.task.timeout=6000000",
"set hive.optimize.bucketmapjoin.sortedmerge=true",
"set io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec",
"set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat",
"set hive.auto.convert.sortmerge.join.noconditionaltask=true",
"set FEED_DATE=20160916",
"set hive.optimize.bucketmapjoin=true",
"set hive.exec.compress.intermediate=true",
"set hive.enforce.bucketmapjoin=true",
"set mapred.output.compress=true",
"set mapreduce.map.output.compress=true",
"set hive.auto.convert.sortmerge.join=true",
"set hive.auto.convert.join=false",
"set mapreduce.reduce.speculative=false",
"set PD_KEY=vijay-test-mail@XXXcommerce.pagerduty.com",
"set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec"
上述例外情况开始失败。
问题 -
答案 0 :(得分:1)
尝试禁用排序合并连接属性,这是一个临时解决方案。
由于您已将sort merge join属性设置为true,因此默认情况下会将io.sort.mb视为2047 MB,这可能会导致Arrayindexoutofbound异常。因此,当您设置排序合并连接属性时,建议根据查询中使用的数据集大小,将sort.io.mb属性设置为最佳值。
要知道查询需要多少数据大小,您可以解释查询: 说明 它显示了每个子查询和阶段中要考虑的数据量。
希望这有帮助。