如何在Hadoop中使用MarkableIterator?

时间:2014-01-30 18:32:00

标签: java hadoop

背景
MarkableIteratorInterface是支持标记重置功能的迭代器的接口。基本上,您可以多次迭代Reducer的值,并标记"标记"一个职位和"重置"回到它。数据暂时缓存到内存中(当内存变满时,再缓存到磁盘上)。

如果Reducer ValueIterator实现MarkableIteratorInterface - 这取决于Hadoop实施,则可以这样做。 MarkableIterator是客户端reducer代码用来迭代(并重申)值的包装器。

以下是有关该功能的JIRA问题:https://issues.apache.org/jira/browse/HADOOP-5266

问题:
我找不到使用此功能的方法。我发现的实际代码唯一提到的是branch on Apache's svn。我似乎无法找到Hadoop发行版的单个版本,无论是来自Cloudera还是仅仅是vanilla Hadoop,都允许这样做。我已经检查了一段时间的grepcode。每个任务都失败并显示错误:java.lang.IllegalArgumentException: Input Iterator not markable

是否可以手动修补源代码并重新编译?有线索吗?

有趣的事情:

pom.xml内容:

CompositeInputFormat

Reducer reduce方法代码: @override

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.nileshc</groupId>
    ...
    ...
    <name>blah</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>2.16</version>
                <configuration>
                    <skipTests>true</skipTests>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>1.2.2</version>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

    <repositories>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
    </repositories>

    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>2.0.0-cdh4.5.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.0.0-mr1-cdh4.5.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>2.0.0-mr1-cdh4.5.0</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.10</version>
            <scope>test</scope>
            <type>jar</type>
        </dependency>
    </dependencies>
</project>

}

堆栈跟踪:

protected void reduce(LongWritable key, Iterable<CustomWritable> values, Context context) throws IOException, InterruptedException {
    MarkableIterator<CustomWritable> itr = new MarkableIterator<CustomWritable>(values.iterator());
    //Error caused in line above
    itr.mark();

    // Compute something
    while (itr.hasNext()) {
        // blah blah...
    }

Hadoop分发详情 Hadoop:2.0.0-mr1-cdh4.5.0(来自CDH4 Tarball 2.0.0-cdh4.5.0)
bin / hadoop版本的输出:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hadoop2/share/hadoop/mapreduce1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hadoop2/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
14/01/31 00:29:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/01/31 00:29:39 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/01/31 00:29:39 INFO input.FileInputFormat: Total input paths to process : 2
14/01/31 00:29:40 INFO mapred.JobClient: Running job: job_201401302213_0004
14/01/31 00:29:42 INFO mapred.JobClient:  map 0% reduce 0%
14/01/31 00:30:02 INFO mapred.JobClient:  map 50% reduce 0%
14/01/31 00:30:25 INFO mapred.JobClient:  map 100% reduce 0%
14/01/31 00:31:29 INFO mapred.JobClient:  map 100% reduce 33%
14/01/31 00:31:32 INFO mapred.JobClient:  map 100% reduce 50%
14/01/31 00:31:33 INFO mapred.JobClient:  map 100% reduce 83%
14/01/31 00:31:53 INFO mapred.JobClient: Task Id : attempt_201401302213_0004_r_000000_0, Status : FAILED
java.lang.IllegalArgumentException: Input Iterator not markable
    at org.apache.hadoop.mapreduce.MarkableIterator.<init>(MarkableIterator.java:45)
    at com.nileshc.graphfu.pagerank.Driver$MyReducer.reduce(Driver.java:185)
    at com.nileshc.graphfu.pagerank.Driver$MyReducer.reduce(Driver.java:95)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

attempt_201401302213_0004_r_000000_0: SLF4J: Class path contains multiple SLF4J bindings.
attempt_201401302213_0004_r_000000_0: SLF4J: Found binding in [jar:file:/hadoop2/share/hadoop/mapreduce1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201401302213_0004_r_000000_0: SLF4J: Found binding in [jar:file:/hadoop2/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201401302213_0004_r_000000_0: SLF4J: Found binding in [jar:file:/tmp/hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201401302213_0004/jars/job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201401302213_0004_r_000000_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
14/01/31 00:31:56 INFO mapred.JobClient: Task Id : attempt_201401302213_0004_r_000001_0, Status : FAILED
java.lang.IllegalArgumentException: Input Iterator not markable
    at org.apache.hadoop.mapreduce.MarkableIterator.<init>(MarkableIterator.java:45)
    at com.nileshc.graphfu.pagerank.Driver$MyReducer.reduce(Driver.java:185)
    at com.nileshc.graphfu.pagerank.Driver$MyReducer.reduce(Driver.java:95)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:610)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:444)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

attempt_201401302213_0004_r_000001_0: SLF4J: Class path contains multiple SLF4J bindings.
attempt_201401302213_0004_r_000001_0: SLF4J: Found binding in [jar:file:/hadoop2/share/hadoop/mapreduce1/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201401302213_0004_r_000001_0: SLF4J: Found binding in [jar:file:/hadoop2/share/hadoop/common/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201401302213_0004_r_000001_0: SLF4J: Found binding in [jar:file:/tmp/hadoop/mapred/local/taskTracker/hadoop/jobcache/job_201401302213_0004/jars/job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
attempt_201401302213_0004_r_000001_0: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
14/01/31 00:31:57 INFO mapred.JobClient:  map 100% reduce 0%
...
...
^C^C^C

0 个答案:

没有答案