Scala Spark-Shell ArrayIndexOutOfBoundsException:7使用sqlContext插入表时

时间:2017-08-18 18:39:41

标签: scala hadoop apache-spark hive apache-spark-sql

这里有完整的错误:

ERROR TaskSetManager: Task 55 in stage 7.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 55 in 
stage 7.0 failed 4 times, most recent failure: Lost task 55.3 in stage 7.0 
(TID 17, <url>): java.lang.ArrayIndexOutOfBoundsException: 7

当我从通过从表中提取而创建的DataFrame插入表中时,我收到此错误。

这里是代码(为了便于阅读,一些代码分为多行):

sqlContext.sql("drop table if exists db.final_scala")

val createTable = "create table if not exists db.final_scala ( 
                      key1 string, 
                      key2 string, 
                      col3 string, 
                      col4 string, 
                      col5 string, 
                      col6 string, 
                      col7 string, 
                      col8 string, 
                      col9 string 
                  )"

sqlContext.sql(createTable)

val insertIntoTable= "insert into table db.final_scala 
    select
        key1,
        key2, 
        coalesce(colx, coly) as col3, 
        concat(date,' 04:01:00.000') as col4, 
        'text' as col5, 
        'text' as col6, 
        col7, 
        col8, 
        col9
    from ( 
        select 
            *, 
            row_number() over (
                partition by key2
                order by from_unixtime(unix_timestamp(date, 'yyyy-MM-dd'))
            desc) as rnk 
        from db2.temp 
            where 
                colA= 'A' 
                and (colB IS NULL OR colB = 'C') 
                and date >= '2017-08-09' 
                and date < '2017-08-10' 
                and key2 is not null 
                and (col7 is not null and col7 > 0) 
        ) a where rnk = 1"

sqlContext.sql(insertIntoTable)

当我直接在HIVE中运行查询时,同样的查询执行,运行和填充我的表就好了。 示例输出:

101     1011 NULL    2017-08-05 04:01:00.000 text   text    70000   0       2017-08-09 08:52:15
102     1022 NULL    2017-08-06 04:01:00.000 text   text    52000   0       2017-08-09 08:52:15
103     1033 NULL    2017-08-05 04:01:00.000 text   text    1200000 0       2017-08-09 08:52:15
104     1044 NULL    2017-08-06 04:01:00.000 text   text    57000   0       2017-08-09 08:52:15
105     1055 NULL    2017-06-17 04:01:00.000 text   text    28080   0       2017-08-09 08:52:15

可能是什么问题,我该如何解决?

0 个答案:

没有答案