使用Apache pig的Wordcount示例

时间:2017-01-06 08:54:09

标签: java maven apache-pig

我是Pig编程的新手。我尝试使用grunt shell。我能得到结果。 我在本地模式下使用java尝试了它。但是我收到了一个错误。

17/01/06 13:51:45 INFO data.SchemaTupleFrontend: Distributed cache not supported or needed in local mode. Setting key [pig.schematuple.local.dir] with code temp directory: /tmp/1483690905600-0
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.mapred.jobcontrol.JobControl.addJob(Lorg/apache/hadoop/mapred/jobcontrol/Job;)Ljava/lang/String;

此问题是否基于版本?

示例代码

public static void main(String[] args)  {

  PigServer pigServer = null;
try {
    pigServer = new PigServer(ExecType.LOCAL, new Properties());
} catch (ExecException e1) {
    // TODO Auto-generated catch block
    e1.printStackTrace();
}

  try {
     runMyQuery(pigServer);
    } 
  catch (IOException e) {
     e.printStackTrace();
    }

}

public static void runMyQuery(PigServer pigServer) throws IOException {        
       pigServer.registerQuery("lines = LOAD '/unmesha/input/FF_weathr/weather_header.csv' AS (line:chararray);");
       pigServer.registerQuery("words = FOREACH lines GENERATE FLATTEN(TOKENIZE(line)) as word;");
       pigServer.registerQuery("grouped = GROUP words BY word;");
       pigServer.registerQuery("wordcount = FOREACH grouped GENERATE group, COUNT(words);");
       pigServer.store("wordcount", "/unmesha/input/FF_weathr/OUT/pig");
   }
}

我的pom.xml

<dependency>
    <groupId>org.apache.pig</groupId>
    <artifactId>pig</artifactId>
    <version>0.16.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-core</artifactId>
    <version>2.6.0-mr1-cdh5.4.5</version>
</dependency>

0 个答案:

没有答案