我正在学习使用python将数据导入到hadoop上的Hive,这里是python代码:
import sys
import datetime
for line in sys.stdin:
line = line.strip()
userid, movieid, rating, unixtime = line.split('\t')
weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
print '\t'.join([userid, movieid, rating, str(weekday)])
这是Mapper脚本:
CREATE TABLE u_data_new (
userid INT,
movieid INT,
rating INT,
weekday INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
add FILE weekday_mapper.py;
INSERT OVERWRITE TABLE u_data_new
SELECT
TRANSFORM (userid, movieid, rating, unixtime)
USING 'python weekday_mapper.py'
AS (userid, movieid, rating, weekday)
FROM u_data;
以下是我收到的错误消息:
Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"userid":222,"movieid":298,"rating":4,"unixtime":"877563253"}
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:168)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"userid":222,"movieid":298,"rating":4,"unixtime":"877563253"}
at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:574)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:159)
在上述错误消息之前,我有以下输出,在我看来,地图作业已完成并成功:
2016-06-17 13:56:34,782 Stage-1 map = 0%, reduce = 0%
2016-06-17 13:56:46,501 Stage-1 map = 100%, reduce = 0%
2016-06-17 13:56:47,871 Stage-1 map = 0%, reduce = 0%
2016-06-17 13:57:17,275 Stage-1 map = 100%, reduce = 0%
我的问题是导致错误的原因以及如何修复错误? 100%for map是什么意思?
非常感谢。
P.S。这是数据:
196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
244 51 2 880606923
166 346 1 886397596
298 474 4 884182806
115 265 2 881171488
253 465 5 891628467
305 451 3 886324817
....
答案 0 :(得分:0)
我刚刚在追溯中注意到了这一点:
处理行时 {"userid":222,"movieid":298,"rating":4,"unixtime":"877563253"}
" unixtime" 是一个字符串,根据您的表格,它应为weekday INT
。