背景
MarkableIteratorInterface
是支持标记重置功能的迭代器的接口。基本上,您可以多次迭代Reducer的值,并标记"标记"一个职位和"重置"回到它。数据暂时缓存到内存中(当内存变满时,再缓存到磁盘上)。
如果Reducer
ValueIterator
实现MarkableIteratorInterface
- 这取决于Hadoop实施,则可以这样做。 MarkableIterator
是客户端reducer代码用来迭代(并重申)值的包装器。
以下是有关该功能的JIRA问题:https://issues.apache.org/jira/browse/HADOOP-5266。
问题:
我找不到使用此功能的方法。我发现的实际代码唯一提到的是branch on Apache's svn。我似乎无法找到Hadoop发行版的单个版本,无论是来自Cloudera还是仅仅是vanilla Hadoop,都允许这样做。我已经检查了一段时间的grepcode。每个任务都失败并显示错误:java.lang.IllegalArgumentException: Input Iterator not markable
。
是否可以手动修补源代码并重新编译?有线索吗?
有趣的事情:
grep -R 'implements MarkableIteratorInterface' src/.
hadoop tarball的root(滚动如下以查看版本详细信息)不会
揭示实现ValueIterator
的{{1}},
从grepcode.com已经很明显了MarkableIteratorInterface
)。 pom.xml内容:
CompositeInputFormat
Reducer reduce方法代码: @override
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.nileshc</groupId>
...
...
<name>blah</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.16</version>
<configuration>
<skipTests>true</skipTests>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>1.2.2</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.0.0-cdh4.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.0.0-mr1-cdh4.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>2.0.0-mr1-cdh4.5.0</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.10</version>
<scope>test</scope>
<type>jar</type>
</dependency>
</dependencies>
</project>
}
堆栈跟踪:
protected void reduce(LongWritable key, Iterable<CustomWritable> values, Context context) throws IOException, InterruptedException {
MarkableIterator<CustomWritable> itr = new MarkableIterator<CustomWritable>(values.iterator());
//Error caused in line above
itr.mark();
// Compute something
while (itr.hasNext()) {
// blah blah...
}
Hadoop分发详情
Hadoop:2.0.0-mr1-cdh4.5.0(来自CDH4 Tarball 2.0.0-cdh4.5.0)
bin / hadoop版本的输出:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop2/share/hadoop/mapreduce1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop2/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
14/01/31 00:29:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/01/31 00:29:39 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/01/31 00:29:39 INFO input.FileInputFormat: Total input paths to process : 2
14/01/31 00:29:40 INFO mapred.JobClient: Running job: job_201401302213_0004
14/01/31 00:29:42 INFO mapred.JobClient: map 0% reduce 0%
14/01/31 00:30:02 INFO mapred.JobClient: map 50% reduce 0%
14/01/31 00:30:25 INFO mapred.JobClient: map 100% reduce 0%
14/01/31 00:31:29 INFO mapred.JobClient: map 100% reduce 33%
14/01/31 00:31:32 INFO mapred.JobClient: map 100% reduce 50%
14/01/31 00:31:33 INFO mapred.JobClient: map 100% reduce 83%
14/01/31 00:31:53 INFO mapred.JobClient: Task Id : attempt_201401302213_0004_r_000000_0, Status : FAILED
java.lang.IllegalArgumentException: Input Iterator not markable
at org.apache.hadoop.mapreduce.MarkableIterator.<init>(MarkableIterator.java:45)
at com.nileshc.graphfu.pagerank.Driver$MyReducer.reduce(Driver.java:185)
at com.nileshc.graphfu.pagerank.Driver$MyReducer.reduce(Driver.java:95)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
attempt_201401302213_0004_r_000000_0: SLF4J: Class path contains multiple SLF4J bindings.
attempt_201401302213_0004_r_000000_0: SLF4J: Found binding in [jar:file:/hadoop2/share/hadoop/mapreduce1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201401302213_0004_r_000000_0: SLF4J: Found binding in [jar:file:/hadoop2/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201401302213_0004_r_000000_0: SLF4J: Found binding in [jar:file:/tmp/hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201401302213_0004/jars/job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201401302213_0004_r_000000_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
14/01/31 00:31:56 INFO mapred.JobClient: Task Id : attempt_201401302213_0004_r_000001_0, Status : FAILED
java.lang.IllegalArgumentException: Input Iterator not markable
at org.apache.hadoop.mapreduce.MarkableIterator.<init>(MarkableIterator.java:45)
at com.nileshc.graphfu.pagerank.Driver$MyReducer.reduce(Driver.java:185)
at com.nileshc.graphfu.pagerank.Driver$MyReducer.reduce(Driver.java:95)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
attempt_201401302213_0004_r_000001_0: SLF4J: Class path contains multiple SLF4J bindings.
attempt_201401302213_0004_r_000001_0: SLF4J: Found binding in [jar:file:/hadoop2/share/hadoop/mapreduce1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201401302213_0004_r_000001_0: SLF4J: Found binding in [jar:file:/hadoop2/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201401302213_0004_r_000001_0: SLF4J: Found binding in [jar:file:/tmp/hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201401302213_0004/jars/job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201401302213_0004_r_000001_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
14/01/31 00:31:57 INFO mapred.JobClient: map 100% reduce 0%
...
...
^C^C^C