使用MongoDB作为hadoop map-reduce作业的I / O

时间:2015-11-18 12:08:14

标签: java mongodb maven hadoop cloudera

我已经不得不尝试执行EnronMail mongo-hadoop连接器示例(https://github.com/mongodb/mongo-hadoop/wiki/Enron-Emails-Example) 没有成功。我收到这个错误:

15/11/18 11:56:23 INFO util.MongoTool: Created a conf: 'Configuration: core-default.xml, core-site.xml, mongo_enron.xml, mapred-default.xml, mapred-site.xml, hdfs-default.xml, hdfs-site.xml' on {class com.mongodb.hadoop.examples.enron.EnronMail} as job named 'EnronMail'
15/11/18 11:56:23 INFO util.MongoTool: Setting up and running MapReduce job in foreground, will wait for results.  {Verbose? true}
15/11/18 11:56:23 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/11/18 11:56:23 INFO mapred.JobClient: Cleaning up the staging area hdfs://MASTER1:8020/tmp/hadoop-mapred/mapred/staging/user/.staging/job_201511020757_0042
15/11/18 11:56:23 ERROR security.UserGroupInformation: PriviledgedActionException as:user (auth:SIMPLE) cause:java.io.IOException: No FileSystem for scheme: mongodb
15/11/18 11:56:23 ERROR util.MongoTool: Exception while executing job...
java.io.IOException: No FileSystem for scheme: mongodb
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2296)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2303)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:87)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2342)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2324)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:194)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:210)
        at com.mongodb.hadoop.BSONFileInputFormat.getSplits(BSONFileInputFormat.java:79)
        at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1079)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1096)
        at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:177)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:995)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
        at com.mongodb.hadoop.util.MongoTool.runMapReduceJob(MongoTool.java:230)
        at com.mongodb.hadoop.util.MongoTool.run(MongoTool.java:100)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at com.mongodb.hadoop.examples.enron.EnronMail.main(EnronMail.java:197)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
在hadoop shell中执行此命令后

 hadoop jar /home/user/Pruebas/jars/bigdata-0.0.3-SNAPSHOT.jar com.mongodb.hadoop.examples.enron.EnronMail -Dmongo.input.split_size=8 -Dmongo.job.verbose=true -Dmongo.input.uri=mongodb://192.168.1.187:27017/mongoHadoopConnector.messages -Dmongo.output.uri=mongodb://192.168.1.187:27017/mongoHadoopConnector.message_pairs

注意: 我在我的机器(192.168.1.187)中启动了mongo服务器进程,并且可以访问LAN中的其他机器。 集合中有数据。 我尝试了几个版本的依赖项。 我的版本:

  • hadoop:Hadoop 2.0.0-cdh4.5.0

  • mongo:3.0.7

这是我的maven项目的POM:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.company.test</groupId>
    <artifactId>bigdata-light</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <packaging>jar</packaging>

    <name>bigdata-light</name>
    <url>http://maven.apache.org</url>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    </properties>

    <repositories>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
    </repositories>

    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>


        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.0.0-cdh4.5.0</version>
        </dependency>

        <dependency>
            <groupId>org.mongodb.mongo-hadoop</groupId>
            <artifactId>mongo-hadoop-core</artifactId>
            <version>1.4.2</version>
        </dependency>

        <dependency>
            <groupId>org.mongodb</groupId>
            <artifactId>mongo-java-driver</artifactId>
            <version>3.0.3</version>
        </dependency>
    </dependencies>
    <build>
        <finalName>bigdata-0.0.3-SNAPSHOT</finalName>
        <plugins>
            <plugin>
                <artifactId>maven-antrun-plugin</artifactId>
                <version>1.7</version>
                <dependencies>
                    <dependency>
                        <groupId>org.apache.ant</groupId>
                        <artifactId>ant-jsch</artifactId>
                        <version>1.9.2</version>
                    </dependency>
                </dependencies>
                <executions>
                    <execution>
                        <phase>install</phase>
                        <configuration>
                            <target>
                                <ant antfile="${basedir}\build.xml">
                                    <target name="upload" />
                                </ant>
                            </target>
                        </configuration>
                        <goals>
                            <goal>run</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>

请,任何帮助将非常感激=)。我被困了好几天......:$

1 个答案:

答案 0 :(得分:0)

我找到了一个解决方案,以帮助那些可能遇到同样问题的人 要阅读MongoDB集合,请使用

small = 25;
large = 35;

// upscale 
linear_extrude(height = 10, center = false, scale = large/small) circle(r = small);

// downscale
translate([2*large,0,0]) {
    linear_extrude(height = 10, center = false, scale = small/large) circle(r = large); 
}

而不是

MapredMongoConfigUtil.setInputFormat(getConf(), com.mongodb.hadoop.mapred.MongoInputFormat.class);

(这是直接从MapredMongoConfigUtil.setInputFormat(getConf(), com.mongodb.hadoop.mapred.BSONFileInputFormat.class); 生成的.bson文件中读取的替代方法)在mapreduce配置类(mongodump)中。