我是apache pig的新手,我无法弄清楚如何编写以下几点的单词计数程序
a。从此输入文件读取,应在Pig脚本中参数化输入文件的位置
C。执行单词计数(单词定界符:空格和其他猪定界符,例如{,},
d。必须忽略注释行
e。按计数排序(首先是常见单词)
我们将不胜感激。
答案 0 :(得分:0)
import java.io.IOException;
import org.apache.pig.PigServer;
public class idLocal {
public static void main(String args[])
{
try{
PigServer pigServer = new PigServer("local");
runIdQuery(pigServer,"/root/Desktop/FILE/sample.txt");
}
catch(Exception e)
{
System.out.print(e);
}
}
public static void runIdQuery(PigServer pigServer,String inputFile) throws IOException{
pigServer.registerQuery("myinput = load'"+ inputFile +"' as(line);");
pigServer.registerQuery("words = foreach myinput generate flatten(TOKENIZE(line)) as word;");
pigServer.registerQuery(" grpd = group words by word;");
pigServer.registerQuery("cntd = foreach grpd generate group,COUNT(words);");
pigServer.store("cntd", "id.out");
}
}
您尝试此代码必须有效.....