Hive:python UDF在关闭运算符时给出“Hive运行时错误”

时间:2015-11-02 04:06:19

标签: python hive

我是Hadoop&的新手python并面临一些问题。感谢你的帮助......

我有一个150个记录(只是一个样本)的文件,每个记录有10列,它们被加载到一个Hive表(table1)中。 10(我们称之为col10)是utf-8编码的,所以为了解码它,我编写了一个小的Python函数(命名为pyfile.py),如下所示:

Python函数:

import sys
import urllib
for line in sys.stdin:
    line = line.strip()
    col10 = urllib.unquote(line).decode('utf8')
    print ''.join(col10.replace("+",' '))

我使用以下命令在分布式缓存中添加了文件:

add FILE folder1/pyfile.py;

现在,我使用Transform在我的hive表的col10上调用这个Python函数,如下所示:

Select Transform(col10)
USING 'python pyfile.py'
AS (col10)
From table1;

问题面临:

问题是当在表的前100条记录上调用它时,它完全正常,但对于101-150条记录失败,并出现以下错误:

2015-10-30 00:58:20,320 INFO [IPC Server handler 0 on 33716] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1445826741287_0032_m_000000_0: Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:217)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying to close the Operator running your custom script.
    at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:557)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
    ... 8 more

我将101-150条记录复制到一个文本文件中,单独运行python脚本 并发现它运行良好。

请告诉我解决投掷错误的原因。

1 个答案:

答案 0 :(得分:1)

您看到的错误消息意味着Python正在抛出一些异常。调试此类事情对我有用的一件事是在我的UDF代码中使用以下模式(see also my blog post about this):

import sys
import urllib

try:
   for line in sys.stdin:
       line = line.strip()
       col10 = urllib.unquote(line).decode('utf8')
       print ''.join(col10.replace("+",' '))

except:
   #In case of an exception, write the stack trace to stdout so that we
   #can see it in Hive, in the results of the UDF call.
   print sys.exc_info()