我正在尝试将json转换为dataframe,创建temptable并执行一些查询。但是我得到了org.apache.hadoop.hive.serde2.SerDeException,因为json有超过7个嵌套级别。我尝试将该属性设置为true ERROR log: error in initSerDe: org.apache.hadoop.hive.serde2.SerDeException Number of levels of nesting supported for LazySimpleSerde is 7 Unable to work with level 9. Use hive.serialization.extend.nesting.levels serde property for tables using LazySimpleSerde.
org.apache.hadoop.hive.serde2.SerDeException: Number of levels of nesting supported for LazySimpleSerde is 7 Unable to work with level 9. Use hive.serialization.extend.nesting.levels serde property for tables using LazySimpleSerde.
但仍然遇到同样的问题。我正在使用spark 1.6.1版本。解决这个问题的任何帮助都会有所帮助。
添加日志
{{1}}
由于
答案 0 :(得分:0)
如果外部表的定义如下:
create external table t1
(
a int,
b double,
c array<struct<
k1:struct<
p1:struct<
r1:struct<
h1:struct<
s1:array<struct<
j1:struct<
x1:int
>
>>
>
>
>
>
>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "mapping.time_stamp" = "timestamp" )
LOCATION '/user/user1/staging/data/populationdata'
;
假设数据包含的嵌套级别大于7。
然后在下一步中,将表展平为
create table t1
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES ( 'hive.serialization.extend.nesting.levels'='true' )
as
select
a,
b,
c1.k1
from
t1
lateral view explode(c) subview as c1
;