Pig - 无法推断org.apache.pig.piggybank.evaluation.datetime.convert.ISOToUnix的匹配函数为多个或不适合

时间:2014-09-25 20:47:18

标签: datetime hadoop apache-pig epoch

我只是想将猪的日期时间格式转换为纪元时间,以便我可以随时进行其他计算。以下是我的(部分)脚本:

DEFINE ISOToUnix org.apache.pig.piggybank.evaluation.datetime.convert.ISOToUnix();
A = LOAD 's3://hearstlogfiles/google/NetworkBackfillImpressions_271283/2014/09/24/NetworkBackfillImpressions_271283_20140924_00.gz' USING PigStorage(',');
B = LIMIT A 10;
C = FOREACH B GENERATE
(chararray)(CONCAT(CONCAT(SUBSTRING($0, 0,10),' '),SUBSTRING($0, 11,19) )) as   dt_string:chararray,
DATE_TIME(CONCAT(CONCAT(SUBSTRING($0, 0,10),' '),SUBSTRING($0, 11,19) )) AS dt;
D = FOREACH C GENERATE 
dt_string, 
dt, 
ISOToUnix(dt)/1000 as epoch:long;
DUMP D;

当猪试图执行下面的行时,我会在它下方得到错误。我知道我把dt格式化为正确的格式。

ISOToUnix(dt)/1000 as epoch:long  
Could not infer the matching function for org.apache.pig.piggybank.evaluation.datetime.convert.ISOToUnix as multiple or none of them fit. Please use an explicit cast.

当我DUMP C时,我得到以下内容。所以我知道C dt格式正确。

(2014-09-24 02:53:54,2014-09-24T02:53:54.000Z)  
(2014-09-24 02:57:54,2014-09-24T02:57:54.000Z)  
(2014-09-24 03:05:06,2014-09-24T03:05:06.000Z)  
(2014-09-24 03:27:30,2014-09-24T03:27:30.000Z)  
(2014-09-24 03:37:00,2014-09-24T03:37:00.000Z)  
(2014-09-24 03:39:18,2014-09-24T03:39:18.000Z)  
(2014-09-24 03:41:24,2014-09-24T03:41:24.000Z)  
(2014-09-24 03:43:18,2014-09-24T03:43:18.000Z)  
(2014-09-24 03:58:12,2014-09-24T03:58:12.000Z)  

请帮忙。

1 个答案:

答案 0 :(得分:0)

https://pig.apache.org/docs/r0.7.0/api/org/apache/pig/piggybank/evaluation/datetime/convert/ISOToUnix.html粘贴示例:

REGISTER /Users/me/commiter/piggybank/java/piggybank.jar ; 
REGISTER /Users/me/commiter/piggybank/java/lib/joda-time-1.6.jar ; 
DEFINE ISOToUnix org.apache.pig.piggybank.evaluation.datetime.convert.ISOToUnix(); 
ISOin = LOAD 'test.tsv' USING PigStorage('\t') AS (dt:chararray, dt2:chararray); 

DESCRIBE ISOin; 
ISOin: {dt: chararray,dt2: chararray} 

DUMP ISOin; 
(2009-01-07T01:07:01.000Z,2008-02-01T00:00:00.000Z) 
(2008-02-06T02:06:02.000Z,2008-02-01T00:00:00.000Z) 
(2007-03-05T03:05:03.000Z,2008-02-01T00:00:00.000Z) 
... 

toUnix = FOREACH ISOin GENERATE ISOToUnix(dt) AS unixTime:long;

DESCRIBE toUnix; 
toUnix: {unixTime: long} 
DUMP toUnix; 
(1231290421000L)
(1202263562000L)
(1173063903000L) 
...

如果您注意到,dt(作为参数传递给ISOToUnix UDF的是chararray。所以你需要将你的'dt'列转换为chararray,如下所示:

C = FOREACH B 
       GENERATE
           (chararray)(CONCAT(CONCAT(SUBSTRING($0, 0,10),' '),
           SUBSTRING($0, 11,19) )) as   dt_string:chararray,
           CONCAT(CONCAT(SUBSTRING($0, 0,10),' '),SUBSTRING($0, 11,19) ) AS dt:chararray;

D = FOREACH C 
       GENERATE 
           dt_string, 
           dt, 
           ISOToUnix((chararray)dt)/1000 as epoch:long;

DUMP D;

希望这有帮助。