Mahout随机森林分类器示例ArrayIndexOutOfBoundsException

时间:2014-03-24 14:13:54

标签: java apache hadoop mahout random-forest

尝试运行random forest example时遇到java.lang.ArrayIndexOutOfBoundsException: 100错误。这里100绑定到树的数量。地图部分100%完成,减少为0%。我使用hadoop-1.2.1mahout-distribution-0.7。我也尝试过mahout-distribution-0.9同样的错误。

有没有人幸运地运行这个例子?

1 个答案:

答案 0 :(得分:1)

发现问题。如果使用mapred.job.tracker = local运行hadoop,则PartialBuilder无法使用mapred.map.tasks获取映射任务的数量。因此,它计算每个映射任务的树数是错误的。

解决方案:不要使用参数" -p"在本地hadoop上运行Random Forest作业时。

详细说明:

windiana@host:~/mahout/data/> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.9-job.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d testdata/KDDTrain+.arff -ds testdata/KDDTrain+.info -sl 5 -t 100 -o nsl-forest
Warning: $HADOOP_HOME is deprecated.

14/08/07 11:25:18 INFO mapreduce.BuildForest: InMem Mapred implementation
14/08/07 11:25:18 INFO mapreduce.BuildForest: Building the forest...
14/08/07 11:25:18 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Creating KDDTrain+.info in /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata-work-5026960219142699303 with rwxr-xr-x
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.info as /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata/KDDTrain+.info
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.info as /tmp/hadoop-martin/mapred/local/archive/-1415030653984777464_-1414908735_797966215/filetestdata/KDDTrain+.info
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Creating KDDTrain+.arff in /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata-work-5750487161401524172 with rwxr-xr-x
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.arff as /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata/KDDTrain+.arff
14/08/07 11:25:19 INFO filecache.TrackerDistributedCacheManager: Cached testdata/KDDTrain+.arff as /tmp/hadoop-martin/mapred/local/archive/3941906571438652588_-1415143228_797959215/filetestdata/KDDTrain+.arff
14/08/07 11:25:19 INFO mapred.JobClient: Running job: job_local966281240_0001
14/08/07 11:25:19 INFO mapred.LocalJobRunner: Waiting for map tasks
14/08/07 11:25:19 INFO mapred.LocalJobRunner: Starting task: attempt_local966281240_0001_m_000000_0
14/08/07 11:25:19 INFO util.ProcessTree: setsid exited with exit code 0
14/08/07 11:25:19 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2df8fdda
14/08/07 11:25:19 INFO mapred.MapTask: Processing split: [firstId:0, nbTrees:100, seed:null]
14/08/07 11:25:19 INFO inmem.InMemMapper: Loading the data...
14/08/07 11:25:20 INFO mapred.JobClient:  map 0% reduce 0%
14/08/07 11:25:21 INFO inmem.InMemMapper: Data loaded : 125973 instances
14/08/07 11:25:25 INFO mapred.LocalJobRunner: 
14/08/07 11:25:26 INFO mapred.JobClient:  map 1% reduce 0%

...

14/08/07 11:27:59 INFO mapred.JobClient:  map 98% reduce 0%
14/08/07 11:28:00 INFO mapred.Task: Task:attempt_local966281240_0001_m_000000_0 is done. And is in the process of commiting
14/08/07 11:28:00 INFO mapred.LocalJobRunner: 
14/08/07 11:28:00 INFO mapred.Task: Task attempt_local966281240_0001_m_000000_0 is allowed to commit now
14/08/07 11:28:00 INFO output.FileOutputCommitter: Saved output of task 'attempt_local966281240_0001_m_000000_0' to file:/home/martin/Programmieren/mahout/data/cut/nsl-forest
14/08/07 11:28:00 INFO mapred.LocalJobRunner: 
14/08/07 11:28:00 INFO mapred.Task: Task 'attempt_local966281240_0001_m_000000_0' done.
14/08/07 11:28:00 INFO mapred.LocalJobRunner: Finishing task: attempt_local966281240_0001_m_000000_0
14/08/07 11:28:00 INFO mapred.LocalJobRunner: Map task executor complete.
14/08/07 11:28:00 INFO mapred.JobClient:  map 99% reduce 0%
14/08/07 11:28:00 INFO mapred.JobClient: Job complete: job_local966281240_0001
14/08/07 11:28:00 INFO mapred.JobClient: Counters: 12
14/08/07 11:28:00 INFO mapred.JobClient:   File Output Format Counters 
14/08/07 11:28:00 INFO mapred.JobClient:     Bytes Written=2353226
14/08/07 11:28:00 INFO mapred.JobClient:   File Input Format Counters 
14/08/07 11:28:00 INFO mapred.JobClient:     Bytes Read=0
14/08/07 11:28:00 INFO mapred.JobClient:   FileSystemCounters
14/08/07 11:28:00 INFO mapred.JobClient:     FILE_BYTES_READ=61962918
14/08/07 11:28:00 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=45667235
14/08/07 11:28:00 INFO mapred.JobClient:   Map-Reduce Framework
14/08/07 11:28:00 INFO mapred.JobClient:     Map input records=100
14/08/07 11:28:00 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
14/08/07 11:28:00 INFO mapred.JobClient:     Spilled Records=0
14/08/07 11:28:00 INFO mapred.JobClient:     Total committed heap usage (bytes)=132120576
14/08/07 11:28:00 INFO mapred.JobClient:     CPU time spent (ms)=0
14/08/07 11:28:00 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
14/08/07 11:28:00 INFO mapred.JobClient:     SPLIT_RAW_BYTES=90
14/08/07 11:28:00 INFO mapred.JobClient:     Map output records=100
14/08/07 11:28:00 INFO common.HadoopUtil: Deleting file:/home/martin/Programmieren/mahout/data/cut/nsl-forest
14/08/07 11:28:00 INFO mapreduce.BuildForest: Build Time: 0h 2m 41s 702
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest num Nodes: 130056
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest mean num Nodes: 1300
14/08/07 11:28:00 INFO mapreduce.BuildForest: Forest mean max Depth: 19
14/08/07 11:28:00 INFO mapreduce.BuildForest: Storing the forest in: nsl-forest/forest.seq