这里有完整的错误:
ERROR TaskSetManager: Task 55 in stage 7.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 in
stage 7.0 failed 4 times, most recent failure: Lost task 55.3 in stage 7.0
(TID 17, <url>): java.lang.ArrayIndexOutOfBoundsException: 7
当我从通过从表中提取而创建的DataFrame插入表中时,我收到此错误。
这里是代码(为了便于阅读,一些代码分为多行):
sqlContext.sql("drop table if exists db.final_scala")
val createTable = "create table if not exists db.final_scala (
key1 string,
key2 string,
col3 string,
col4 string,
col5 string,
col6 string,
col7 string,
col8 string,
col9 string
)"
sqlContext.sql(createTable)
val insertIntoTable= "insert into table db.final_scala
select
key1,
key2,
coalesce(colx, coly) as col3,
concat(date,' 04:01:00.000') as col4,
'text' as col5,
'text' as col6,
col7,
col8,
col9
from (
select
*,
row_number() over (
partition by key2
order by from_unixtime(unix_timestamp(date, 'yyyy-MM-dd'))
desc) as rnk
from db2.temp
where
colA= 'A'
and (colB IS NULL OR colB = 'C')
and date >= '2017-08-09'
and date < '2017-08-10'
and key2 is not null
and (col7 is not null and col7 > 0)
) a where rnk = 1"
sqlContext.sql(insertIntoTable)
当我直接在HIVE中运行查询时,同样的查询执行,运行和填充我的表就好了。 示例输出:
101 1011 NULL 2017-08-05 04:01:00.000 text text 70000 0 2017-08-09 08:52:15
102 1022 NULL 2017-08-06 04:01:00.000 text text 52000 0 2017-08-09 08:52:15
103 1033 NULL 2017-08-05 04:01:00.000 text text 1200000 0 2017-08-09 08:52:15
104 1044 NULL 2017-08-06 04:01:00.000 text text 57000 0 2017-08-09 08:52:15
105 1055 NULL 2017-06-17 04:01:00.000 text text 28080 0 2017-08-09 08:52:15
可能是什么问题,我该如何解决?