apache pig Latin中的单词计数程序

时间:2018-10-03 10:19:58

标签: hadoop apache-pig

我是apache pig的新手,我无法弄清楚如何编写以下几点的单词计数程序

a。从此输入文件读取,应在Pig脚本中参数化输入文件的位置
C。执行单词计数(单词定界符:空格和其他猪定界符,例如{,},
d。必须忽略注释行
e。按计数排序(首先是常见单词)

我们将不胜感激。

1 个答案:

答案 0 :(得分:0)

import java.io.IOException;

import org.apache.pig.PigServer;


public class idLocal {

public static void main(String args[])
{
    try{
        PigServer pigServer = new PigServer("local");
        runIdQuery(pigServer,"/root/Desktop/FILE/sample.txt");

    }
    catch(Exception e)
    {
        System.out.print(e);
    }
}
public static void runIdQuery(PigServer pigServer,String inputFile) throws IOException{

    pigServer.registerQuery("myinput = load'"+ inputFile +"' as(line);");
    pigServer.registerQuery("words = foreach myinput generate flatten(TOKENIZE(line)) as word;");
    pigServer.registerQuery(" grpd = group words by word;");
    pigServer.registerQuery("cntd = foreach grpd generate group,COUNT(words);");
    pigServer.store("cntd", "id.out");
}
}

您尝试此代码必须有效.....