尝试运行random forest example时遇到java.lang.ArrayIndexOutOfBoundsException: 100
错误。这里100绑定到树的数量。地图部分100%完成,减少为0%。我使用hadoop-1.2.1
和mahout-distribution-0.7
。我也尝试过mahout-distribution-0.9
同样的错误。
有没有人幸运地运行这个例子?
答案 0 :(得分:1)
发现问题。如果使用mapred.job.tracker = local运行hadoop,则PartialBuilder无法使用mapred.map.tasks获取映射任务的数量。因此,它计算每个映射任务的树数是错误的。
解决方案:不要使用参数" -p"在本地hadoop上运行Random Forest作业时。
详细说明:
windiana@host:~/mahout/data/> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.9-job.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d testdata/KDDTrain+.arff -ds testdata/KDDTrain+.info -sl 5 -t 100 -o nsl-forest
Warning: $HADOOP_HOME is deprecated.
14/08/07 11:25:18 INFO mapreduce.BuildForest: InMem Mapred implementation
14/08/07 11:25:18 INFO mapreduce.BuildForest: Building the forest...
14/08/07 11:25:18 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Creating KDDTrain+.info in /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata-work-5026960219142699303 with rwxr-xr-x
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.info as /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata/KDDTrain+.info
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.info as /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata/KDDTrain+.info
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Creating KDDTrain+.arff in /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata-work-5750487161401524172 with rwxr-xr-x
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.arff as /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata/KDDTrain+.arff
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.arff as /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata/KDDTrain+.arff
14/08/07 11:25:19 INFO mapred.JobClient: Running job: job_local966281240_0001
14/08/07 11:25:19 INFO mapred.LocalJobRunner: Waiting for map tasks
14/08/07 11:25:19 INFO mapred.LocalJobRunner: Starting task: attempt_local966281240_0001_m_000000_0
14/08/07 11:25:19 INFO util.ProcessTree: setsid exited with exit code 0
14/08/07 11:25:19 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2df8fdda
14/08/07 11:25:19 INFO mapred.MapTask: Processing split: [firstId:0, nbTrees:100, seed:null]
14/08/07 11:25:19 INFO inmem.InMemMapper: Loading the data...
14/08/07 11:25:20 INFO mapred.JobClient: map 0% reduce 0%
14/08/07 11:25:21 INFO inmem.InMemMapper: Data loaded : 125973 instances
14/08/07 11:25:25 INFO mapred.LocalJobRunner:
14/08/07 11:25:26 INFO mapred.JobClient: map 1% reduce 0%
...
14/08/07 11:27:59 INFO mapred.JobClient: map 98% reduce 0%
14/08/07 11:28:00 INFO mapred.Task: Task:attempt_local966281240_0001_m_000000_0 is done. And is in the process of commiting
14/08/07 11:28:00 INFO mapred.LocalJobRunner:
14/08/07 11:28:00 INFO mapred.Task: Task attempt_local966281240_0001_m_000000_0 is allowed to commit now
14/08/07 11:28:00 INFO output.FileOutputCommitter: Saved output of task 'attempt_local966281240_0001_m_000000_0' to file:/home/martin/Programmieren/mahout/data/cut/nsl-forest
14/08/07 11:28:00 INFO mapred.LocalJobRunner:
14/08/07 11:28:00 INFO mapred.Task: Task 'attempt_local966281240_0001_m_000000_0' done.
14/08/07 11:28:00 INFO mapred.LocalJobRunner: Finishing task: attempt_local966281240_0001_m_000000_0
14/08/07 11:28:00 INFO mapred.LocalJobRunner: Map task executor complete.
14/08/07 11:28:00 INFO mapred.JobClient: map 99% reduce 0%
14/08/07 11:28:00 INFO mapred.JobClient: Job complete: job_local966281240_0001
14/08/07 11:28:00 INFO mapred.JobClient: Counters: 12
14/08/07 11:28:00 INFO mapred.JobClient: File Output Format Counters
14/08/07 11:28:00 INFO mapred.JobClient: Bytes Written=2353226
14/08/07 11:28:00 INFO mapred.JobClient: File Input Format Counters
14/08/07 11:28:00 INFO mapred.JobClient: Bytes Read=0
14/08/07 11:28:00 INFO mapred.JobClient: FileSystemCounters
14/08/07 11:28:00 INFO mapred.JobClient: FILE_BYTES_READ=61962918
14/08/07 11:28:00 INFO mapred.JobClient: FILE_BYTES_WRITTEN=45667235
14/08/07 11:28:00 INFO mapred.JobClient: Map-Reduce Framework
14/08/07 11:28:00 INFO mapred.JobClient: Map input records=100
14/08/07 11:28:00 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
14/08/07 11:28:00 INFO mapred.JobClient: Spilled Records=0
14/08/07 11:28:00 INFO mapred.JobClient: Total committed heap usage (bytes)=132120576
14/08/07 11:28:00 INFO mapred.JobClient: CPU time spent (ms)=0
14/08/07 11:28:00 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
14/08/07 11:28:00 INFO mapred.JobClient: SPLIT_RAW_BYTES=90
14/08/07 11:28:00 INFO mapred.JobClient: Map output records=100
14/08/07 11:28:00 INFO common.HadoopUtil: Deleting file:/home/martin/Programmieren/mahout/data/cut/nsl-forest
14/08/07 11:28:00 INFO mapreduce.BuildForest: Build Time: 0h 2m 41s 702
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest num Nodes: 130056
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest mean num Nodes: 1300
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest mean max Depth: 19
14/08/07 11:28:00 INFO mapreduce.BuildForest: Storing the forest in: nsl-forest/forest.seq