在PIG中存储日期和时间

时间:2014-09-23 13:04:57

标签: apache-pig

我正在尝试存储一个分别有两列日期和时间的txt文件。 像这样的东西: 1999-01-01 12:08:56

现在我想使用PIG执行一些Date操作,但我想像这样存储日期和时间 1999-01-01T12:08:56(我查了这个链接): http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html

我想知道的是,我可以使用哪种格式,我的日期和时间在一列中,以便我可以将其提供给PIG,然后如何将该日期加载到猪中。我知道我们将其更改为日期时间,但显示错误。有人可以告诉我如何一起加载日期和时间数据。一个例子会有很大的帮助。

1 个答案:

答案 0 :(得分:2)

如果这对你有用,请告诉我。

input.txt  
1999-01-01 12:08:56  
1999-01-02 12:08:57  
1999-01-03 12:08:58  
1999-01-04 12:08:59  

PigScript:  
A = LOAD 'input.txt' using PigStorage(' ') as(date:chararray,time:chararray);  
B = FOREACH A GENERATE CONCAT(date,'T',time) as myDateString;  
C = FOREACH B GENERATE ToDate(myDateString);  
dump C;  

Output:  
(1999-01-01T12:08:56.000+05:30)  
(1999-01-02T12:08:57.000+05:30)  
(1999-01-03T12:08:58.000+05:30)  
(1999-01-04T12:08:59.000+05:30)  

Now the myDateString is in date object, you can process this data using all the build in date functions.

Incase if you want to store the output as in this format 
(1999-01-01T12:08:56)  
(1999-01-02T12:08:57)  
(1999-01-03T12:08:58)  
(1999-01-04T12:08:59)

you can use REGEX_EXTRACT to parse the each data till "."  something like this  

D = FOREACH C GENERATE ToString($0) as temp;
E = FOREACH D GENERATE REGEX_EXTRACT(temp, '(.*)\\.(.*)', 1);
dump E;

Output:
(1999-01-01T12:08:56)  
(1999-01-02T12:08:57)  
(1999-01-03T12:08:58)  
(1999-01-04T12:08:59)