假设数据集包含2个字段:字段,提问时间
fields question time
php,error,gd,image-processing 1235000501
php,error,gd,image-processing 1235000551
lisp,scheme,subjective,clojure 1235000177
lisp,scheme,subjective,clojure 1235001545
lisp,scheme,subjective,clojure 1235002457
lisp,scheme,subjective,clojure 1235002809
lisp,scheme,subjective,clojure 1235003266
lisp,scheme,subjective,clojure 1235007817
lisp,scheme,subjective,clojure 1235007913
lisp,scheme,subjective,clojure 1235020626
lisp,scheme,subjective,clojure 1235040652
我尝试了以下代码
DEFINE UnixToISO org.apache.pig.piggybank.evaluation.datetime.convert.UnixToISO();
A= LOAD '/user/home/book3.csv' using PigStorage() as (fields:chararray,question time:long);
B= foreach A generate fields,UnixToISO(question time * 1000 ) as temp;
DUMP B;
与输入相同没有变化
C= foreach B generate fields, ToDate(temp) as date_time;
DUMP C;
与输入相同没有变化
D= foreach C generate fields, GetHour(date_time) as hour;
DUMP D;
没有与input相同的变化。我的代码中出现了什么错误?
答案 0 :(得分:0)
我认为样本数据格式不正确。检查分隔两列数据的分隔符。我取出了样本数据并删除了列数据之间的空格,并用一个标签替换它们。这样我就是能够加载两列数据并提取时间戳并获取小时。 见下面的脚本和输出。
<强>脚本强>
A = LOAD 'test4.txt' using PigStorage('\t') AS (fields:chararray,question_time:long);
B = foreach A generate fields,ToDate(question_time * 1000 ) as temp;
C = foreach B generate fields, GetHour(temp) as hour;
DUMP C;