我正在运行最短路径算法,使用map-reduce框架通过Hadoop-1.2.1在图形上的图形(~1百万个顶点,10百万个边缘)。我正在使用的代码工作正常,因为我已在小数据集上测试它。但是当我在这个大型数据集上运行时,代码运行了一段时间,直到100%Map,100%Reduce,但在此之后被卡住并抛出“java.lang.NumberFormatException”
13/11/26 09:27:52 INFO output.FileOutputCommitter: Saved output of task 'attempt_local849927259_0001_r_000000_0' to /home/hduser/Desktop/final65440004050210
13/11/26 09:27:52 INFO mapred.LocalJobRunner: reduce > reduce
13/11/26 09:27:52 INFO mapred.Task: Task 'attempt_local849927259_0001_r_000000_0' done.
13/11/26 09:27:53 INFO mapred.JobClient: map 100% reduce 100%
13/11/26 09:27:53 INFO mapred.JobClient: Job complete: job_local849927259_0001
13/11/26 09:27:53 INFO mapred.JobClient: Counters: 20
13/11/26 09:27:53 INFO mapred.JobClient: File Output Format Counters
13/11/26 09:27:53 INFO mapred.JobClient: Bytes Written=52398725
13/11/26 09:27:53 INFO mapred.JobClient: FileSystemCounters
13/11/26 09:27:53 INFO mapred.JobClient: FILE_BYTES_READ=988857216
13/11/26 09:27:53 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1230974329
13/11/26 09:27:53 INFO mapred.JobClient: File Input Format Counters
13/11/26 09:27:53 INFO mapred.JobClient: Bytes Read=39978936
13/11/26 09:27:53 INFO mapred.JobClient: Map-Reduce Framework
13/11/26 09:27:53 INFO mapred.JobClient: Reduce input groups=1137931
13/11/26 09:27:53 INFO mapred.JobClient: Map output materialized bytes=163158951
13/11/26 09:27:53 INFO mapred.JobClient: Combine output records=0
13/11/26 09:27:53 INFO mapred.JobClient: Map input records=570075
13/11/26 09:27:53 INFO mapred.JobClient: Reduce shuffle bytes=0
13/11/26 09:27:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
13/11/26 09:27:53 INFO mapred.JobClient: Reduce output records=1137931
13/11/26 09:27:53 INFO mapred.JobClient: Spilled Records=21331172
13/11/26 09:27:53 INFO mapred.JobClient: Map output bytes=150932554
13/11/26 09:27:53 INFO mapred.JobClient: CPU time spent (ms)=0
13/11/26 09:27:53 INFO mapred.JobClient: Total committed heap usage (bytes)=1638268928
13/11/26 09:27:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
13/11/26 09:27:53 INFO mapred.JobClient: Combine input records=0
13/11/26 09:27:53 INFO mapred.JobClient: Map output records=6084261
13/11/26 09:27:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=202
13/11/26 09:27:53 INFO mapred.JobClient: Reduce input records=6084261
13/11/26 09:27:55 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/11/26 09:27:55 INFO input.FileInputFormat: Total input paths to process : 1
13/11/26 09:27:56 INFO mapred.JobClient: Running job: job_local2046662654_0002
13/11/26 09:27:56 INFO mapred.LocalJobRunner: Waiting for map tasks
13/11/26 09:27:56 INFO mapred.LocalJobRunner: Starting task: attempt_local2046662654_0002_m_000000_0
13/11/26 09:27:56 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@43c319b9
13/11/26 09:27:56 INFO mapred.MapTask: Processing split: file:/home/hduser/Desktop/final65440004050210/part-r-00000:0+33554432
13/11/26 09:27:56 INFO mapred.MapTask: io.sort.mb = 100
13/11/26 09:27:56 INFO mapred.MapTask: data buffer = 79691776/99614720
13/11/26 09:27:56 INFO mapred.MapTask: record buffer = 262144/327680
13/11/26 09:27:56 INFO mapred.MapTask: Starting flush of map output
13/11/26 09:27:56 INFO mapred.MapTask: Finished spill 0
13/11/26 09:27:56 INFO mapred.LocalJobRunner: Starting task: attempt_local2046662654_0002_m_000001_0
13/11/26 09:27:56 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4c6b851b
13/11/26 09:27:56 INFO mapred.MapTask: Processing split: file:/home/hduser/Desktop/final65440004050210/part-r-00000:33554432+18438093
13/11/26 09:27:56 INFO mapred.MapTask: io.sort.mb = 100
13/11/26 09:27:56 INFO mapred.MapTask: data buffer = 79691776/99614720
13/11/26 09:27:56 INFO mapred.MapTask: record buffer = 262144/327680
13/11/26 09:27:56 INFO mapred.MapTask: Starting flush of map output
13/11/26 09:27:56 INFO mapred.LocalJobRunner: Map task executor complete.
13/11/26 09:27:56 WARN mapred.LocalJobRunner: job_local2046662654_0002
java.lang.Exception: java.lang.NumberFormatException: For input string: "UNMODED"
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NumberFormatException: For input string: "UNMODED"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:492)
at java.lang.Integer.parseInt(Integer.java:527)
at graph.Dijkstra$TheMapper.map(Dijkstra.java:42)
at graph.Dijkstra$TheMapper.map(Dijkstra.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
13/11/26 09:27:57 INFO mapred.JobClient: map 0% reduce 0%
13/11/26 09:27:57 INFO mapred.JobClient: Job complete: job_local2046662654_0002
13/11/26 09:27:57 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.io.FileNotFoundException: File /home/hduser/Desktop/final65474682682135/part-r-00000 does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:402)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:255)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436)
at graph.Dijkstra.run(Dijkstra.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at graph.Dijkstra.main(Dijkstra.java:181)
代码:
package graph;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.HashMap;
import java.util.Iterator;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.*;
import org.apache.hadoop.mapreduce.lib.output.*;
import org.apache.hadoop.util.*;
public class Dijkstra extends Configured implements Tool {
public static String OUT = "outfile";
public static String IN = "inputlarger";
public static class TheMapper extends Mapper<LongWritable, Text, LongWritable, Text> {
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//From slide 20 of Graph Algorithms with MapReduce (by Jimmy Lin, Univ @ Maryland)
//Key is node n
//Value is D, Points-To
//For every point (or key), look at everything it points to.
//Emit or write to the points to variable with the current distance + 1
Text word = new Text();
String line = value.toString();//looks like 1 0 2:3:
String[] sp = line.split(" ");//splits on space
/* String[] bh = null;
for(int i=0; i<sp.length-2;i++){
bh[i] = sp[i+2];
}*/
int distanceadd = Integer.parseInt(sp[1]) + 1;
String[] PointsTo = sp[2].split(":");
//System.out.println("Pont4");
for(int i=0; i<PointsTo.length; i++){
word.set("VALUE "+distanceadd);//tells me to look at distance value
context.write(new LongWritable(Integer.parseInt(PointsTo[i])), word);
word.clear();
}
//pass in current node's distance (if it is the lowest distance)
//System.out.println("Pont3");
word.set("VALUE "+sp[1]);
context.write( new LongWritable( Integer.parseInt( sp[0] ) ), word );
word.clear();
word.set("NODES "+sp[2]);//tells me to append on the final tally
context.write( new LongWritable( Integer.parseInt( sp[0] ) ), word );
word.clear();
}
}
public static class TheReducer extends Reducer<LongWritable, Text, LongWritable, Text> {
public void reduce(LongWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
//From slide 20 of Graph Algorithms with MapReduce (by Jimmy Lin, Univ @ Maryland)
//The key is the current point
//The values are all the possible distances to this point
//we simply emit the point and the minimum distance value
System.out.println("in reuduce");
String nodes = "UNMODED";
Text word = new Text();
int lowest = 10009;//start at infinity
for (Text val : values) {//looks like NODES/VALUES 1 0 2:3:, we need to use the first as a key
String[] sp = val.toString().split(" ");//splits on space
//look at first value
if(sp[0].equalsIgnoreCase("NODES")){
//System.out.println("Pont1");
nodes = null;
nodes = sp[1];
}else if(sp[0].equalsIgnoreCase("VALUE")){
//System.out.println("Pont2");
int distance = Integer.parseInt(sp[1]);
lowest = Math.min(distance, lowest);
}
}
word.set(lowest+" "+nodes);
context.write(key, word);
word.clear();
}
}
//Almost exactly from http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html
public int run(String[] args) throws Exception {
//http://code.google.com/p/joycrawler/source/browse/NetflixChallenge/src/org/niubility/learning/knn/KNNDriver.java?r=242
getConf().set("mapred.textoutputformat.separator", " ");//make the key -> value space separated (for iterations)
//set in and out to args.
//IN = args[0];
//OUT = args[1];
IN = "/home/hduser/Desktop/youCP3.txt";
OUT = "/home/hduser/Desktop/final";
String infile = IN;
String outputfile = OUT + System.nanoTime();
boolean isdone = false;
boolean success = false;
HashMap <Integer, Integer> _map = new HashMap<Integer, Integer>();
while(isdone == false){
Job job = new Job(getConf());
job.setJarByClass(Dijkstra.class);
job.setJobName("Dijkstra");
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(TheMapper.class);
job.setReducerClass(TheReducer.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(infile));
FileOutputFormat.setOutputPath(job, new Path(outputfile));
success = job.waitForCompletion(true);
//remove the input file
//http://eclipse.sys-con.com/node/1287801/mobile
if(infile != IN){
String indir = infile.replace("part-r-00000", "");
Path ddir = new Path(indir);
FileSystem dfs = FileSystem.get(getConf());
dfs.delete(ddir, true);
}
infile = outputfile+"/part-r-00000";
outputfile = OUT + System.nanoTime();
//do we need to re-run the job with the new input file??
//http://www.hadoop-blog.com/2010/11/how-to-read-file-from-hdfs-in-hadoop.html
isdone = true;//set the job to NOT run again!
Path ofile = new Path(infile);
FileSystem fs = FileSystem.get(new Configuration());
BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(ofile)));
HashMap<Integer, Integer> imap = new HashMap<Integer, Integer>();
String line=br.readLine();
while (line != null){
//each line looks like 0 1 2:3:
//we need to verify node -> distance doesn't change
String[] sp = line.split(" ");
int node = Integer.parseInt(sp[0]);
int distance = Integer.parseInt(sp[1]);
imap.put(node, distance);
line=br.readLine();
}
if(_map.isEmpty()){
//first iteration... must do a second iteration regardless!
isdone = false;
}else{
//http://www.java-examples.com/iterate-through-values-java-hashmap-example
//http://www.javabeat.net/articles/33-generics-in-java-50-1.html
Iterator<Integer> itr = imap.keySet().iterator();
while(itr.hasNext()){
int key = itr.next();
int val = imap.get(key);
if(_map.get(key) != val){
//values aren't the same... we aren't at convergence yet
isdone = false;
}
}
}
if(isdone == false){
_map.putAll(imap);//copy imap to _map for the next iteration (if required)
}
}
return success ? 0 : 1;
}
public static void main(String[] args) throws Exception {
System.exit(ToolRunner.run(new Dijkstra(), args));
}
}
small working dataset:
1 0 2:3:
2 10000 1:4:5:
3 10000 1:
4 10000 2:5:
4 10000 6:
5 10000 2:4:
6 10000 4:
6 10000 7:
7 10000 6: