使用python导入配置单元中的数据时出错

时间:2016-06-17 18:05:37

标签: python hadoop

我正在学习使用python将数据导入到hadoop上的Hive,这里是python代码:

import sys
import datetime

for line in sys.stdin:
    line = line.strip()
    userid, movieid, rating, unixtime = line.split('\t')
    weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
    print '\t'.join([userid, movieid, rating, str(weekday)])

这是Mapper脚本:

CREATE TABLE u_data_new (
userid INT,
movieid INT,
rating INT,
weekday INT)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
add FILE weekday_mapper.py;
INSERT OVERWRITE TABLE u_data_new
SELECT
TRANSFORM (userid, movieid, rating, unixtime)
USING 'python weekday_mapper.py'
AS (userid, movieid, rating, weekday)
FROM u_data;

以下是我收到的错误消息:

Diagnostic Messages for this Task:
Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"userid":222,"movieid":298,"rating":4,"unixtime":"877563253"}
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:168)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"userid":222,"movieid":298,"rating":4,"unixtime":"877563253"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:574)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:159)

在上述错误消息之前,我有以下输出,在我看来,地图作业已完成并成功:

2016-06-17 13:56:34,782 Stage-1 map = 0%,  reduce = 0%
2016-06-17 13:56:46,501 Stage-1 map = 100%,  reduce = 0%
2016-06-17 13:56:47,871 Stage-1 map = 0%,  reduce = 0%
2016-06-17 13:57:17,275 Stage-1 map = 100%,  reduce = 0%

我的问题是导致错误的原因以及如何修复错误? 100%for map是什么意思?

非常感谢。

P.S。这是数据:

196     242     3       881250949
186     302     3       891717742
22      377     1       878887116
244     51      2       880606923
166     346     1       886397596
298     474     4       884182806
115     265     2       881171488
253     465     5       891628467
305     451     3       886324817
....

1 个答案:

答案 0 :(得分:0)

我刚刚在追溯中注意到了这一点:

处理行时

{"userid":222,"movieid":298,"rating":4,"unixtime":"877563253"}

" unixtime" 是一个字符串,根据您的表格,它应为weekday INT