我将以下Hive表的select输出设为null。
Describe studentdetails;
clustername string
schemaname string
tablename string
primary_key map<string,int>
schooldata struct<alternate_aliases:string,application_deadline:bigint,application_deadline_early_action:string,application_deadline_early_decision:bigint,calendaring_system:string,fips_code:string,funding_type:string,gender_preference:string,iped_id:bigint,learning_environment:string,mascot:string,offers_open_admission:boolean,offers_rolling_admission:boolean,region:string,religious_affiliation:string,school_abbreviation:string,school_colors:string,school_locale:string,school_term:string,short_name:string,created_date:bigint,modified_date:bigint,percent_students_outof_state:float> from deserializer
deletedind boolean
truncatedind boolean
versionid bigint
select * from studentdetails limit 3;
输出
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL NULL
我在创建表时使用了以下属性。
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ("ignore.malformed.json" = "true")
选择数据时的以下属性。
SET hive.exec.compress.output=true;
SET io.seqfile.compression.type=BLOCK;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
ADD JAR s3://emr/hive/lib/hive-serde-1.0.jar;
答案 0 :(得分:0)
感谢您的评论,我找到了解决方案。
问题是我的json文件中的列名和我在创建表时使用的列名是不同的。
当我在Hive表和Json文件之间同步列名时问题已解决。
谢谢&amp;问候, Srivignesh KN