如何在spark中将hive.serialization.extend.nesting.levels属性设置为true?

时间:2017-01-20 22:41:43

标签: json apache-spark hive apache-spark-sql

我正在尝试将json转换为dataframe,创建temptable并执行一些查询。但是我得到了org.apache.hadoop.hive.serde2.SerDeException,因为json有超过7个嵌套级别。我尝试将该属性设置为true ERROR log: error in initSerDe: org.apache.hadoop.hive.serde2.SerDeException Number of levels of nesting supported for LazySimpleSerde is 7 Unable to work with level 9. Use hive.serialization.extend.nesting.levels serde property for tables using LazySimpleSerde. org.apache.hadoop.hive.serde2.SerDeException: Number of levels of nesting supported for LazySimpleSerde is 7 Unable to work with level 9. Use hive.serialization.extend.nesting.levels serde property for tables using LazySimpleSerde. 但仍然遇到同样的问题。我正在使用spark 1.6.1版本。解决这个问题的任何帮助都会有所帮助。

添加日志
{{1}}

由于

1 个答案:

答案 0 :(得分:0)

如果外部表的定义如下:

create external table t1
(
 a int,
 b double,
 c array<struct<
          k1:struct<
                     p1:struct<
                              r1:struct<
                                        h1:struct<
                                                  s1:array<struct<
                                                                  j1:struct<
                                                                            x1:int
                                                                           >
                                                        >>
                                              >
                                     >
                            >
                    >
         >>

 )
 ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
 WITH SERDEPROPERTIES ( "mapping.time_stamp" = "timestamp" ) 
 LOCATION '/user/user1/staging/data/populationdata'
  ;

假设数据包含的嵌套级别大于7。

然后在下一步中,将表展平为

 create table t1
 ROW FORMAT SERDE   'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
 WITH SERDEPROPERTIES ( 'hive.serialization.extend.nesting.levels'='true' )
 as
 select
   a, 
   b, 
   c1.k1
 from 
   t1
 lateral view explode(c) subview as c1
 ;