pig SUM FOREACH GROUP ClassCastException:java.lang.String无法强制转换为java.lang.Number

时间:2015-09-12 02:33:51

标签: hadoop apache-pig

我在hadoop中有一组URL和相关的事务时间。我正在尝试编写一个猪脚本,以便为每个URL提供总交易时间。每次尝试SUM事务时间时,我都会收到ClassCastException。我第一次尝试猪所以任何帮助表示赞赏。我无法弄清楚我做错了什么。

以下是一些输出:网址和交易时间

mod_rewrite

当我执行时     DUMP total_tx_time我得到:

grunt> DESCRIBE uLogUrls
uLogUrls: {url: chararray,et: int}
grunt> DUMP uLogUrls

(/index.jsp,344)
(/another/Access.jsp,517)
(/index.jsp,5)
(/another/NoAccess.jsp,4)
(/index.jsp,5)
(/index.jsp,4)

grps = GROUP uLogUrls BY url;
DUMP grps

(/index.jsp,{(/index.jsp,344),(/index.jsp,5),(/index.jsp,5),(/index.jsp,4)})
(/home/home.jsp,{(/home/home.jsp,11200)})


grunt> DESCRIBE grps
grps: {group: chararray,uLogUrls: {(url: chararray,et: int)}}

total_tx_time = foreach grps generate group as url, SUM(uLogUrls.et);

任何想法我做错了什么?

感谢!!!

1 个答案:

答案 0 :(得分:1)

原因是在原始的FOREACH中生成uLogUrls我没有正确地进行加倍。

uLogUrls = FOREACH uLogs GENERATE logName as url, runTime as et:double;

以上命令是创建此异常的原因(注意每个数字没有小数位)。

DUMP uLogUrls

(/index.jsp,344)
(/secur/blah.jsp,517)
(/index.jsp,5)
(/secur/blah.jsp,4)
(/index.jsp,5)
....snip....

但是当我这样投出时:

grunt> uLogUrls = FOREACH uLogs GENERATE logName as url, (double)runTime as et;
grunt> DUMP uLogUrls

(/index.jsp,344.0)
(/secur/blah.jsp,517.0)
(/index.jsp,5.0)
(/secur/blah.jsp,4.0)
(/index.jsp,5.0)
...snip....

然后GROUPSUM功能起作用。谢谢你的帮助!