我正在使用Pig 0.12.0和Hadoop 2.2.0。我已经在本地和地图缩减模式下成功地从grunt shell和pig batch脚本运行了pig。现在我正试图用Java中的嵌入式猪来运行猪。
话虽如此,我也成功地以本地模式运行嵌入式猪。但是,我遇到了在map reduce模式下运行嵌入式猪的问题。
问题是:成功编译类后,运行
时没有任何反应 java -cp <classpath> PigMapRedMode
我后来看到有人说我应该在类路径中包含pig.properties。如
fs.default.name=hdfs://<namenode-hostname>:<port>
mapred.job.tracker=<jobtracker-hostname>:<port>
但是,在Hadoop 2.2.0中,JobTracker不再存在。有什么想法怎么办?
我附上了PigMapRedMode的Java代码,以防这里出现问题。
import java.io.IOException;
import org.apache.pig.PigServer;
public class PigMapRedMode {
public static void main(String[] arg){
try {
PigServer pigServer = new PigServer("map reduce, (need to add properties file)");
runIdQuery(pigServer, "5pts.txt");
} catch (Exception e){
}
}
public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException {
pigServer.registerQuery("A = load '" + inputFile + "' using PigStorage(',');");
pigServer.registerQuery("B = foreach A generate $0 as id;");
pigServer.store("B", "id.out");
}
}
更新
已找到解决方案!实际上,不需要在类路径中提供Properties对象或使用pig.properties,您所要做的就是在类路径中包含Hadoop配置目录:(对于我的Hadoop 2.2.0,它是/ etc / hadoop)和可以从该位置检索df.default.address和yarn.resourcemanager.address。
我在下面添加了修改过的java代码:
/**
* Created by allenlin on 2/19/14.
*/
import java.io.IOException;
import java.util.Properties;
import org.apache.pig.ExecType;
import org.apache.pig.PigServer;
public class PigMapRedMode {
public static void main(String[] arg){
try {
PigServer pigServer = new PigServer(ExecType.MAPREDUCE);
runIdQuery(pigServer, "<hdfs input address>");
} catch (Exception e){
}
}
public static void runIdQuery(PigServer pigServer, String inputFile) throws IOException {
pigServer.registerQuery("A = load '" + inputFile + "' using PigStorage(',');");
pigServer.registerQuery("B = foreach A generate $0 as id;");
pigServer.store("B", "<hdfs output address>");
}
}
我用来运行java类的Unix命令。请注意您需要包含的依赖项:
java -cp ".:$PIG_HOME/build/pig-0.12.1-SNAPSHOT.jar:$HADOOP_HOME/share/hadoop/common/lib/*:$HADOOP_HOME/share/hadoop/common/*:$HADOOP_HOME/share/hadoop/mapreduce/*:antlr-runtime-3.4.jar:$HADOOP_HOME/share/hadoop/yarn/*:$HADOOP_HOME/share/hadoop/hdfs/*:$PIG_HOME/build/ivy/lib/Pig/*:$HADOOP_CONF_DIR" PigMapRedMode
感谢@zsxwing的帮助!
答案 0 :(得分:0)
以下是我如何运行嵌入式猪
public class test1 {
public static void main(String[] args) {
try {
PigServer pigServer = new PigServer(ExecType.MAPREDUCE);
runQuery(pigServer);
Properties props = new Properties();
props.setProperty("fs.default.name", "hdfs://localhost:9000");
}catch(Exception e) {
e.printStackTrace();
}
}
public static void runQuery(PigServer pigServer) {
try {
pigServer.registerQuery("input1 = LOAD '/input.data' as (line:chararray);");
pigServer.registerQuery("words = foreach input1 generate FLATTEN(TOKENIZE(line)) as word;");
pigServer.registerQuery("word_groups = group words by word;");
pigServer.registerQuery("word_count = foreach word_groups generate group, COUNT(words);");
pigServer.registerQuery("ordered_word_count = order word_count by group desc;");
pigServer.registerQuery("store ordered_word_count into '/wct';");
} catch(Exception e) {
e.printStackTrace();
}
}
}
在eclipse中设置HADOOP_HOME
Run Configurations-->ClassPath-->User Entries-->Advanced-->Add ClassPath Variables-->New-->Name(HADOOP_HOME)-->Path(You Hadoop directory path)
我添加了Maven依赖项
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.1</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.4</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.16</version>
</dependency>
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
<version>0.15.0</version>
</dependency>
<dependency>
<groupId>org.antlr</groupId>
<artifactId>antlr-runtime</artifactId>
<version>3.4</version>
</dependency>
</dependencies>
如果未正确设置HADOOP_HOME,则会出现以下错误
hadoop20.PigJobControl: falling back to default JobControl (not using hadoop 0.20 ?)