我的日志文件包含登录input_to_log.txt的一部分,如下所示:
{"agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36",
"context":{"course_id":"edx/AN101/2014_T1","module":{"display_name":"Multiple Choice Questions"},"org_id":"edx","user_id":9999999}}
我在hive shell中的查询:
ADD JAR /usr/lib/hive/apache-hive-0.13.0-bin/lib/hive-serde-0.13.0.jar;
CREATE EXTERNAL TABLE edx_lg ( agent STRING,context STRUCT<course_id:STRING,module:STRUCT<display_name:STRING>,org_id:STRING,user_id:INT>)ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' WITH SERDEPROPERTIES("agent"="$.agent","context"="$.context.course_id,$.context.module.display_name,$.context.org_id,$.context.user_id");
load data local inpath '/home/hduser/input_to_log.txt' into table edx_lg;
select agent,context.course_id,context.module.display_name,context.org_id,context.user_id from edx_lg;
我的输出:
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201406102148_0019, Tracking URL = http://x.x.x.x:50030/jobdetails.jsp?jobid=job_201406102148_0019
Kill Command = /usr/local/hadoop/libexec/../bin/hadoop job -kill job_201406102148_0019
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-06-12 03:00:42,300 Stage-1 map = 0%, reduce = 0%
2014-06-12 03:00:44,311 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.72 sec
2014-06-12 03:00:46,324 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 0.72 sec
MapReduce Total cumulative CPU time: 720 msec
Ended Job = job_201406102148_0019
推出MapReduce职位:
Job 0: Map: 1 Cumulative CPU: 0.72 sec HDFS Read: 467 HDFS Write: 30 SUCCESS
Total MapReduce CPU Time Spent: 720 msec
OK
NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL
Time taken: 11.278 seconds, Fetched: 2 row(s)
Select语句返回Null值。我该怎样做才能获取表格中的数据?
答案 0 :(得分:0)
此查询返回什么内容?
select * from edx_lg
检查这是否有帮助: HIVE Query returning null values after import data from local stored file