如何在猪中执行这个逗号分隔列表?

时间:2016-02-11 18:24:15

标签: apache-pig

假设数据集包含2个字段:字段,提问时间

            fields                                 question time

php,error,gd,image-processing                               1235000501
php,error,gd,image-processing                               1235000551 
lisp,scheme,subjective,clojure                              1235000177
lisp,scheme,subjective,clojure                              1235001545
lisp,scheme,subjective,clojure                              1235002457
lisp,scheme,subjective,clojure                              1235002809
lisp,scheme,subjective,clojure                              1235003266
lisp,scheme,subjective,clojure                              1235007817
lisp,scheme,subjective,clojure                              1235007913
lisp,scheme,subjective,clojure                              1235020626
lisp,scheme,subjective,clojure                              1235040652

我尝试了以下代码

DEFINE UnixToISO org.apache.pig.piggybank.evaluation.datetime.convert.UnixToISO();
A= LOAD '/user/home/book3.csv' using PigStorage() as (fields:chararray,question time:long);
B= foreach A generate fields,UnixToISO(question time * 1000 ) as temp;
DUMP B;

与输入相同没有变化

C= foreach B generate fields, ToDate(temp) as date_time;
DUMP C;

与输入相同没有变化

D= foreach C generate fields, GetHour(date_time) as hour;
DUMP D;

没有与input相同的变化。我的代码中出现了什么错误?

1 个答案:

答案 0 :(得分:0)

我认为样本数据格式不正确。检查分隔两列数据的分隔符。我取出了样本数据并删除了列数据之间的空格,并用一个标签替换它们。这样我就是能够加载两列数据并提取时间戳并获取小时。 见下面的脚本和输出。

<强>脚本

A = LOAD 'test4.txt' using PigStorage('\t') AS (fields:chararray,question_time:long);
B = foreach A generate fields,ToDate(question_time * 1000 ) as temp;
C = foreach B generate fields, GetHour(temp) as hour;
DUMP C;

TimeStamp to Date

Date to Hour