我正在关注Apache Pig页面here中的PigUnit测试示例。我尝试使用Maven项目在Eclipse中执行代码示例。我已经在pom.xml中添加了Pig和PigUnit依赖项,尝试了0.14和0.15版本。
这是来自Apache Pig页面的PigUnit测试代码(我用课程附上了它):
@Test
public void testTop2Queries() {
String[] args = {
"n=2",
};
PigTest test = new PigTest("top_queries.pig", args);
String[] input = {
"yahoo",
"yahoo",
"yahoo",
"twitter",
"facebook",
"facebook",
"linkedin",
};
String[] output = {
"(yahoo,3)",
"(facebook,2)",
};
test.assertOutput("data", input, "queries_limit", output);
}
和Pig脚本,也复制了:
data = LOAD 'input' AS (query:CHARARRAY);
queries_group = GROUP data BY query;
queries_count = FOREACH queries_group GENERATE group AS query, COUNT(data) AS total;
queries_ordered = ORDER queries_count BY total DESC, query;
queries_limit = LIMIT queries_ordered 2;
STORE queries_limit INTO 'output';
然而,当我尝试Run As>时,我遇到了这个结果。 JUnit测试:
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias queries_limit
at org.apache.pig.PigServer.openIterator(PigServer.java:935)
...[truncated]
Caused by: java.io.IOException: Couldn't retrieve job.
at org.apache.pig.PigServer.store(PigServer.java:999)
at org.apache.pig.PigServer.openIterator(PigServer.java:910)
... 28 more
这是我得到的控制台输出:
STORE queries_limit INTO 'output';
--> none
data: {query: chararray}
data = LOAD 'input' AS (query:CHARARRAY);
--> data = LOAD 'file:/tmp/temp-820202225/tmp-1722948946' USING PigStorage('\t') AS (
query: chararray
);
STORE queries_limit INTO 'output';
--> none
看起来Pig脚本正在尝试加载本地文件系统数据以便输入'而不是使用Java String[]
变量'输入'变量
任何人都可以帮忙吗?
答案 0 :(得分:2)
在进入解决方案之前,我想评论一下pig脚本是从本地磁盘加载的事实。当pig覆盖一个语句并为它提供模拟数据时,它会在本地磁盘上创建一个文件并加载它。这就是你看到该文件被加载的原因。如果查看该文件,您应该看到您在字符串数组中提供的数据,输入。
对于仍在寻找解决方案的人来说,以下是对我有用的。该解决方案基于pig版本0.15和Hadoop 2.7.1。在我看来,你必须指定你需要的猪神器。
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pigunit</artifactId>
<version>${pig.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.pig</groupId>
<artifactId>pig</artifactId>
<version>${pig.version}</version>
<classifier>h2</classifier>
<!-- NOTE: It is very important to have this classifier. Unit tests will
break if this doesn't exist. This gets the pig jars for Hadoop v2. -->
</dependency>
以下是pig github页面上一些非常有用的课程。
PigTest实施(适合阅读API文档): https://github.com/apache/pig/blob/trunk/test/org/apache/pig/pigunit/PigTest.java
PigUnit示例: https://github.com/apache/pig/blob/trunk/test/org/apache/pig/test/pigunit/TestPigTest.java