从mongodb导入数据到hdfs

时间:2015-09-09 10:33:06

标签: hadoop

将数据从mongodb导入到hdfs时出错。 我正在使用:

  • Ambari Sandbox [Hortonworks] Hadoop 2.7
  • MongoDB 3.0版

这些是我包含的jar文件:

  • 蒙戈-java的驾驶员2.11.4.jar
  • 蒙戈-Hadoop的芯 - 1.3.0.jar

以下是我正在使用的代码:

   package com.mongo.test;
    import java.io.*;
    import org.apache.commons.logging.*;
    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.*;
    import org.apache.hadoop.mapreduce.*;
    import org.bson.*;
    import com.mongodb.MongoClient;
    import com.mongodb.hadoop.*;
    import com.mongodb.hadoop.util.*;

    public class ImportFromMongoToHdfs {
    private static final Log log =  
    LogFactory.getLog(ImportFromMongoToHdfs.class);
    public static class ReadEmpDataFromMongo extends Mapper<Object,    
    BSONObject, Text, Text>{
    public void map(Object key, BSONObject value, Context context) throws  
    IOException, InterruptedException{
    System.out.println("Key: " + key);
    System.out.println("Value: " + value);
    String md5 = value.get("md5").toString();
    String name = value.get("name").toString();
    String dev = value.get("dev").toString();
    String salary = value.get("salary").toString();
    String location = value.get("location").toString();
    String output = "\t" + name + "\t" + dev + "\t" + salary + "\t" +   
    location;
    context.write( new Text(md5), new Text(output));
    }
    }
    public static void main(String[] args)throws Exception { 
    final Configuration conf = new Configuration();
    MongoConfigUtil.setInputURI(conf,"mongodb://10.25.3.196:27017/admin.emp")
    ;
    MongoConfigUtil.setCreateInputSplits(conf, false);
    System.out.println("Configuration: " + conf);
    final Job job = new Job(conf, "ReadWeblogsFromMongo");
    Path out = new Path("/mongodb3");
    FileOutputFormat.setOutputPath(job, out);
    job.setJarByClass(ImportFromMongoToHdfs.class);
    job.setMapperClass(ReadEmpDataFromMongo.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    job.setInputFormatClass(com.mongodb.hadoop.MongoInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    job.setNumReduceTasks(0);
    System.exit(job.waitForCompletion(true) ? 0 : 1 );
    }
    }

这是我要回来的错误:

 [root@sandbox ~]# hadoop jar /mongoinput/mongdbconnect.jar com.mongo.test.ImportFromMongoToHdfs

WARNING: Use "yarn jar" to launch YARN applications.
Configuration: Configuration: core-default.xml, core-site.xml
15/09/09 09:22:51 INFO impl.TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/
15/09/09 09:22:53 INFO client.RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/10.25.3.209:8050
15/09/09 09:22:53 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/09/09 09:22:54 INFO splitter.SingleMongoSplitter: SingleMongoSplitter calculating splits for mongodb://10.25.3.196:27017/admin.emp
15/09/09 09:22:54 INFO mapreduce.JobSubmitter: number of splits:1
15/09/09 09:22:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1441784509780_0003
15/09/09 09:22:55 INFO impl.YarnClientImpl: Submitted application application_1441784509780_0003
15/09/09 09:22:55 INFO mapreduce.Job: The url to track the job: http://sandbox.hortonworks.com:8088/proxy/application_1441784509780_0003/
15/09/09 09:22:55 INFO mapreduce.Job: Running job: job_1441784509780_0003
15/09/09 09:23:05 INFO mapreduce.Job: Job job_1441784509780_0003 running in uber mode : false
15/09/09 09:23:05 INFO mapreduce.Job:  map 0% reduce 0%
15/09/09 09:23:12 INFO mapreduce.Job: Task Id : attempt_1441784509780_0003_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.mongodb.hadoop.MongoInputFormat not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:174)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.mongodb.hadoop.MongoInputFormat not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
        ... 8 more
15/09/09 09:23:18 INFO mapreduce.Job: Task Id : attempt_1441784509780_0003_m_000000_1, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.mongodb.hadoop.MongoInputFormat not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:174)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.mongodb.hadoop.MongoInputFormat not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
        ... 8 more
15/09/09 09:23:24 INFO mapreduce.Job: Task Id : attempt_1441784509780_0003_m_000000_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.mongodb.hadoop.MongoInputFormat not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:174)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:749)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.ClassNotFoundException: Class com.mongodb.hadoop.MongoInputFormat not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
        ... 8 more

15/09/09 09:23:32 INFO mapreduce.Job:  map 100% reduce 0%
15/09/09 09:23:32 INFO mapreduce.Job: Job job_1441784509780_0003 failed with state FAILED due to: Task failed task_1441784509780_0003_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
15/09/09 09:23:32 INFO mapreduce.Job: Counters: 9
        Job Counters
                Failed map tasks=4
                Launched map tasks=4
                Other local map tasks=3
                Rack-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=16996
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=16996
                Total vcore-seconds taken by all map tasks=16996
                Total megabyte-seconds taken by all map tasks=4249000
[root@sandbox ~]#

有谁知道出了什么问题?

3 个答案:

答案 0 :(得分:0)

确保在Hadoop类路径中保留mongo-hadoop jar并重新启动Hadoop。 应该解决错误java.lang.ClassNotFoundException: Class com.mongodb.hadoop.MongoInputFormat

答案 1 :(得分:0)

你得到了ClassNotFoundException,因为你无法访问jar“mongo-hadoop-core * .jar”。你必须为你的代码提供“mongo-hadoop-core * .jar”

您可以通过多种方式解决此错误 -

  1. 为您的程序创建Fat Jar。脂肪罐将包含所有必要的依赖罐。如果您使用任何IDE,则可以轻松创建胖罐。

  2. 在提交纱线作业时使用“-libjars”参数

  3. 将mongo jars复制到Hadoop_Classpath位置

答案 2 :(得分:0)

我刚刚解决了这样的问题。实际上,这在运行时是一个错误。如果我们将Hadoop_ClassPath设置为指向外部必需的jar文件,那还不够。因为,我认为在运行时,Hadoop会在安装Hadoop的文件夹中查找jar文件。我意识到我们需要在安装Hadoop的文件夹中复制所有必要的外部jar文件。所以: 首先,您需要输入以下内容来检查HADOOP_CLASSPATH:   - hadoop classpath 然后将必要的外部jar文件复制到HADOOP_CLASSPATH中。例如,我会将mongo-hadoop-1.5.1.jar和其他一些jar文件复制到文件夹/usr/local/hadoop/share/hadoop/mapreduce

然后它适合我!