HADOOP:在100%map和100%reduce之后抛出java.lang.NumberFormatException

时间:2013-11-26 15:21:00

标签: java hadoop mapreduce

我正在运行最短路径算法,使用map-reduce框架通过Hadoop-1.2.1在图形上的图形(~1百万个顶点,10百万个边缘)。我正在使用的代码工作正常,因为我已在小数据集上测试它。但是当我在这个大型数据集上运行时,代码运行了一段时间,直到100%Map,100%Reduce,但在此之后被卡住并抛出“java.lang.NumberFormatException”

13/11/26 09:27:52 INFO output.FileOutputCommitter: Saved output of task 'attempt_local849927259_0001_r_000000_0' to /home/hduser/Desktop/final65440004050210
13/11/26 09:27:52 INFO mapred.LocalJobRunner: reduce > reduce
13/11/26 09:27:52 INFO mapred.Task: Task 'attempt_local849927259_0001_r_000000_0' done.
13/11/26 09:27:53 INFO mapred.JobClient:  map 100% reduce 100%
13/11/26 09:27:53 INFO mapred.JobClient: Job complete: job_local849927259_0001
13/11/26 09:27:53 INFO mapred.JobClient: Counters: 20
13/11/26 09:27:53 INFO mapred.JobClient:   File Output Format Counters 
13/11/26 09:27:53 INFO mapred.JobClient:     Bytes Written=52398725
13/11/26 09:27:53 INFO mapred.JobClient:   FileSystemCounters
13/11/26 09:27:53 INFO mapred.JobClient:     FILE_BYTES_READ=988857216
13/11/26 09:27:53 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1230974329
13/11/26 09:27:53 INFO mapred.JobClient:   File Input Format Counters 
13/11/26 09:27:53 INFO mapred.JobClient:     Bytes Read=39978936
13/11/26 09:27:53 INFO mapred.JobClient:   Map-Reduce Framework
13/11/26 09:27:53 INFO mapred.JobClient:     Reduce input groups=1137931
13/11/26 09:27:53 INFO mapred.JobClient:     Map output materialized bytes=163158951
13/11/26 09:27:53 INFO mapred.JobClient:     Combine output records=0
13/11/26 09:27:53 INFO mapred.JobClient:     Map input records=570075
13/11/26 09:27:53 INFO mapred.JobClient:     Reduce shuffle bytes=0
13/11/26 09:27:53 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
13/11/26 09:27:53 INFO mapred.JobClient:     Reduce output records=1137931
13/11/26 09:27:53 INFO mapred.JobClient:     Spilled Records=21331172
13/11/26 09:27:53 INFO mapred.JobClient:     Map output bytes=150932554
13/11/26 09:27:53 INFO mapred.JobClient:     CPU time spent (ms)=0
13/11/26 09:27:53 INFO mapred.JobClient:     Total committed heap usage (bytes)=1638268928
13/11/26 09:27:53 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
13/11/26 09:27:53 INFO mapred.JobClient:     Combine input records=0
13/11/26 09:27:53 INFO mapred.JobClient:     Map output records=6084261
13/11/26 09:27:53 INFO mapred.JobClient:     SPLIT_RAW_BYTES=202
13/11/26 09:27:53 INFO mapred.JobClient:     Reduce input records=6084261
13/11/26 09:27:55 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/11/26 09:27:55 INFO input.FileInputFormat: Total input paths to process : 1
13/11/26 09:27:56 INFO mapred.JobClient: Running job: job_local2046662654_0002
13/11/26 09:27:56 INFO mapred.LocalJobRunner: Waiting for map tasks
13/11/26 09:27:56 INFO mapred.LocalJobRunner: Starting task: attempt_local2046662654_0002_m_000000_0
13/11/26 09:27:56 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@43c319b9
13/11/26 09:27:56 INFO mapred.MapTask: Processing split: file:/home/hduser/Desktop/final65440004050210/part-r-00000:0+33554432
13/11/26 09:27:56 INFO mapred.MapTask: io.sort.mb = 100
13/11/26 09:27:56 INFO mapred.MapTask: data buffer = 79691776/99614720
13/11/26 09:27:56 INFO mapred.MapTask: record buffer = 262144/327680
13/11/26 09:27:56 INFO mapred.MapTask: Starting flush of map output
13/11/26 09:27:56 INFO mapred.MapTask: Finished spill 0
13/11/26 09:27:56 INFO mapred.LocalJobRunner: Starting task: attempt_local2046662654_0002_m_000001_0
13/11/26 09:27:56 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4c6b851b
13/11/26 09:27:56 INFO mapred.MapTask: Processing split: file:/home/hduser/Desktop/final65440004050210/part-r-00000:33554432+18438093
13/11/26 09:27:56 INFO mapred.MapTask: io.sort.mb = 100
13/11/26 09:27:56 INFO mapred.MapTask: data buffer = 79691776/99614720
13/11/26 09:27:56 INFO mapred.MapTask: record buffer = 262144/327680
13/11/26 09:27:56 INFO mapred.MapTask: Starting flush of map output
13/11/26 09:27:56 INFO mapred.LocalJobRunner: Map task executor complete.
13/11/26 09:27:56 WARN mapred.LocalJobRunner: job_local2046662654_0002
java.lang.Exception: java.lang.NumberFormatException: For input string: "UNMODED"
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NumberFormatException: For input string: "UNMODED"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:492)
    at java.lang.Integer.parseInt(Integer.java:527)
    at graph.Dijkstra$TheMapper.map(Dijkstra.java:42)
    at graph.Dijkstra$TheMapper.map(Dijkstra.java:1)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)
13/11/26 09:27:57 INFO mapred.JobClient:  map 0% reduce 0%
13/11/26 09:27:57 INFO mapred.JobClient: Job complete: job_local2046662654_0002
13/11/26 09:27:57 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.io.FileNotFoundException: File /home/hduser/Desktop/final65474682682135/part-r-00000 does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:402)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:255)
    at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
    at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436)
    at graph.Dijkstra.run(Dijkstra.java:143)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at graph.Dijkstra.main(Dijkstra.java:181)

代码:

package graph;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.Iterator;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;

public class Dijkstra extends Configured implements Tool {

    public static String OUT = "outfile";
    public static String IN = "inputlarger";

    public static class TheMapper extends Mapper<LongWritable, Text, LongWritable, Text> {

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //From slide 20 of Graph Algorithms with MapReduce (by Jimmy Lin, Univ @ Maryland)
        //Key is node n
        //Value is D, Points-To
        //For every point (or key), look at everything it points to.
        //Emit or write to the points to variable with the current distance + 1
        Text word = new Text();
        String line = value.toString();//looks like 1 0 2:3:
        String[] sp = line.split(" ");//splits on space
       /* String[] bh = null;
        for(int i=0; i<sp.length-2;i++){
            bh[i] = sp[i+2];
        }*/
        int distanceadd = Integer.parseInt(sp[1]) + 1;
        String[] PointsTo = sp[2].split(":");
        //System.out.println("Pont4");
        for(int i=0; i<PointsTo.length; i++){
            word.set("VALUE "+distanceadd);//tells me to look at distance value
            context.write(new LongWritable(Integer.parseInt(PointsTo[i])), word);
            word.clear();
        }
        //pass in current node's distance (if it is the lowest distance)
        //System.out.println("Pont3");
        word.set("VALUE "+sp[1]);
        context.write( new LongWritable( Integer.parseInt( sp[0] ) ), word );
        word.clear();

        word.set("NODES "+sp[2]);//tells me to append on the final tally
        context.write( new LongWritable( Integer.parseInt( sp[0] ) ), word );
        word.clear();

    }
    }

    public static class TheReducer extends Reducer<LongWritable, Text, LongWritable, Text> {
    public void reduce(LongWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        //From slide 20 of Graph Algorithms with MapReduce (by Jimmy Lin, Univ @ Maryland)
        //The key is the current point
        //The values are all the possible distances to this point
        //we simply emit the point and the minimum distance value
        System.out.println("in reuduce");
        String nodes = "UNMODED";
        Text word = new Text();
        int lowest = 10009;//start at infinity

        for (Text val : values) {//looks like NODES/VALUES 1 0 2:3:, we need to use the first as a key
            String[] sp = val.toString().split(" ");//splits on space
            //look at first value
            if(sp[0].equalsIgnoreCase("NODES")){
                //System.out.println("Pont1");
                nodes = null;
                nodes = sp[1];
            }else if(sp[0].equalsIgnoreCase("VALUE")){
                //System.out.println("Pont2");
               int distance = Integer.parseInt(sp[1]);

                lowest = Math.min(distance, lowest);
            }
        }
        word.set(lowest+" "+nodes);
        context.write(key, word);
        word.clear();
    }
    }

    //Almost exactly from http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html
    public int run(String[] args) throws Exception {
    //http://code.google.com/p/joycrawler/source/browse/NetflixChallenge/src/org/niubility/learning/knn/KNNDriver.java?r=242
    getConf().set("mapred.textoutputformat.separator", " ");//make the key -> value space separated (for iterations)

    //set in and out to args.
    //IN = args[0];
    //OUT = args[1];
       IN = "/home/hduser/Desktop/youCP3.txt";

       OUT = "/home/hduser/Desktop/final";

    String infile = IN;
    String outputfile = OUT + System.nanoTime();

    boolean isdone = false;
    boolean success = false;

    HashMap <Integer, Integer> _map = new HashMap<Integer, Integer>();

    while(isdone == false){

        Job job = new Job(getConf());
        job.setJarByClass(Dijkstra.class);
        job.setJobName("Dijkstra");
        job.setOutputKeyClass(LongWritable.class);
        job.setOutputValueClass(Text.class);
        job.setMapperClass(TheMapper.class);
        job.setReducerClass(TheReducer.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(infile));
        FileOutputFormat.setOutputPath(job, new Path(outputfile));

        success = job.waitForCompletion(true);

        //remove the input file
        //http://eclipse.sys-con.com/node/1287801/mobile
        if(infile != IN){
            String indir = infile.replace("part-r-00000", "");
            Path ddir = new Path(indir);
            FileSystem dfs = FileSystem.get(getConf());
            dfs.delete(ddir, true);
        }

        infile = outputfile+"/part-r-00000";
        outputfile = OUT + System.nanoTime();

        //do we need to re-run the job with the new input file??
        //http://www.hadoop-blog.com/2010/11/how-to-read-file-from-hdfs-in-hadoop.html
        isdone = true;//set the job to NOT run again!
        Path ofile = new Path(infile);
        FileSystem fs = FileSystem.get(new Configuration());
        BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(ofile)));

        HashMap<Integer, Integer> imap = new HashMap<Integer, Integer>();
        String line=br.readLine();
        while (line != null){
            //each line looks like 0 1 2:3:
            //we need to verify node -> distance doesn't change
            String[] sp = line.split(" ");
            int node = Integer.parseInt(sp[0]);
            int distance = Integer.parseInt(sp[1]);
            imap.put(node, distance);
            line=br.readLine();
        }
        if(_map.isEmpty()){
            //first iteration... must do a second iteration regardless!
            isdone = false;
        }else{
            //http://www.java-examples.com/iterate-through-values-java-hashmap-example
            //http://www.javabeat.net/articles/33-generics-in-java-50-1.html
            Iterator<Integer> itr = imap.keySet().iterator();
            while(itr.hasNext()){
                int key = itr.next();
                int val = imap.get(key);
                if(_map.get(key) != val){
                    //values aren't the same... we aren't at convergence yet
                    isdone = false;
                }
            }
        }
        if(isdone == false){
            _map.putAll(imap);//copy imap to _map for the next iteration (if required)
        }
    }

    return success ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
    System.exit(ToolRunner.run(new Dijkstra(), args));
    }
}

 small working dataset:
1 0 2:3:
2 10000 1:4:5:
3 10000 1:
4 10000 2:5:
4 10000 6:
5 10000 2:4:
6 10000 4:
6 10000 7:
7 10000 6:

0 个答案:

没有答案