Question

抱歉错误的措辞问题。我是stackoverflow的新手，我是PIG的新手并试图自己试验。

我有一个处理words.t文件和data.txt文件的方案。

words.txt

word1
word2
word3
word4

data.txt中

{"created_at":"18:47:31,Sun Sep 30 2012","text":"RT @Joey7Barton: ..give a word1 about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616}

我需要输出

（word1_epochtime）{在文字属性中匹配的完整数据}

即

(word1_1234567890){"created_at":"18:47:31,Sun Sep 30 2012","text":"RT @Joey7Barton: ..give a word1 about whether the americans wins a Ryder cup. I mean surely he has slightly more important matters. #fami ...","user_id":450990391,"id":252479809098223616}

我的输出为

（word1）{＆＃34; created_at＆＃34;：＆＃34; 18：47：31，Sun Sep 30 2012＆＃34;，＆＃34; text＆＃34;：＆＃34; RT @ Joey7Barton ： ..给一个关于美国人是否赢得莱德杯的问题。我的意思是他肯定有一些更重要的事情。 #fami ...＆＃34;＆＃34; USER_ID＆＃34;：450990391＆＃34; ID＆＃34;：252479809098223616}

使用此脚本。

load words.txt
load data.txt
c = cross words,data;
d = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',words::word),'.*'));
e =  foreach (group d BY word) {data);

我用

这个词得到了纪元

time = FOREACH words GENERATE CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at));

但是我无法随着时间推移这些词语。

如何将输出作为

(word1_time){data}

请随时向我推荐以上内容。谢谢。

Answer 1

我想我得到了输出。这是我写的脚本。

d = FILTER c BY (data::text MATCHES CONCAT(CONCAT('.*',word::word),'.*'));
e = FOREACH d GENERATE CONCAT(CONCAT(word,'_'),(chararray)ToUnixTime(CurrentTime(created_at))) as epochtime;
f = foreach (group e BY epochtime) {data}
dump f;

Answer 2

每this reference，CONCAT接收两个“字段”作为输入。我认为在你的情况下问题是(chararray)ToUnixTime(CurrentTime())，不是字段名称。您可以生成表示当前时间戳值的字段，然后在concat函数中使用它。

PIG：CONCAT与另一个RELATION的关系OUTPUT

2 个答案: