是否可以使用Hadoop并行(多核)运行Mahout k-means algorithm?怎么样?
Mahout使用Hadoop 运行,但它只使用一个CPU :
mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job --input testdata --output end1200_50 --numClusters 1200 --t1 1000 --t2 500 --maxIter 50
Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
MAHOUT-JOB: /usr/local/mahout/mahout-examples-0.10.1-job.jar
[...]
我的文件是HDFS hadoop fs -ls /user/root/testdata
Found 12 items
-rw-r--r-- 1 root supergroup 373560731 2015-06-26 07:51 /user/root/testdata/16773m.mat.txt
-rw-r--r-- 1 root supergroup 373819865 2015-06-26 07:51 /user/root/testdata/16786m.mat.txt
[...]
我的mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>14</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx7000M</value>
</property>
</configuration>