Mahout:KMeans聚类

时间:2013-10-24 07:59:42

标签: java mahout k-means

我是Mahout的新手,我有这个代码:

public class mahout {

public static final double[][] points = { {1, 1}, {2, 1}, {1, 2},{2, 2}, {3, 3}, {8, 8}, {9, 8}, {8, 9}, {9, 9}};

public static List<Vector> getPoints(double[][] raw) {
List<Vector> points = new ArrayList<Vector>();
for (int i = 0; i < raw.length; i++) {
 double[] fr = raw[i];
   Vector vec = new RandomAccessSparseVector(fr.length);
vec.assign(fr);
points.add(vec);
}

return points;

}

public static void main(String args[]) throws Exception {

int k = 2;

List<Vector> vectors = getPoints(points);

File testData = new File("testdata");
if (!testData.exists()) {
  testData.mkdir();
}
testData = new File("testdata/points");
if (!testData.exists()) {
  testData.mkdir();
}

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
ClusterHelper.writePointsToFile(vectors, conf, new Path("testdata/points/file1"));

Path path = new Path("testdata/clusters/part-00000");
SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,
    path, Text.class, Kluster.class);

for (int i = 0; i < k; i++) {
  Vector vec = vectors.get(i);
  Kluster cluster = new Kluster(vec, i, new EuclideanDistanceMeasure());
  writer.append(new Text(cluster.getIdentifier()), cluster);
}
writer.close();

Path output = new Path("output");
HadoopUtil.delete(conf, output);

KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"),
  output, new EuclideanDistanceMeasure(), 0.001, 10,
  true, 0.0,false);

SequenceFile.Reader reader = new SequenceFile.Reader(fs,
    new Path("output/" + Kluster.CLUSTERED_POINTS_DIR
             + "/part-m-00000"), conf);

IntWritable key = new IntWritable();
WeightedVectorWritable value = new WeightedVectorWritable();
while (reader.next(key, value)) {
  System.out.println(value.toString() + " belongs to cluster "
                     + key.toString());
}
reader.close();
}
}

但是当我运行代码时出现这些错误:

 24-ott-2013 9.50.25 org.apache.hadoop.util.NativeCodeLoader <clinit>
AVVERTENZA: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24-ott-2013 9.50.25 org.slf4j.impl.JCLLoggerAdapter info
INFO: Deleting output
24-ott-2013 9.50.25 org.slf4j.impl.JCLLoggerAdapter info
INFO: Input: testdata/points Clusters In: testdata/clusters Out: output Distance: org.apache.mahout.common.distance.EuclideanDistanceMeasure
24-ott-2013 9.50.25 org.slf4j.impl.JCLLoggerAdapter info
INFO: convergence: 0.0010 max Iterations: 10
24-ott-2013 9.50.25 org.apache.hadoop.security.UserGroupInformation doAs
GRAVE: PriviledgedActionException as:hp cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-hp\mapred\staging\hp1776229724\.staging to 0700
Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-hp\mapred\staging\hp1776229724\.staging to 0700
    at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
    at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
    at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
    at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
    at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
    at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918)
    at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Unknown Source)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
    at org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:182)
    at org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:223)
    at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:143)
    at mahout.main(mahout.java:69)

问题出在哪里?我该如何解决?

2 个答案:

答案 0 :(得分:0)

在Windows上运行Hadoop时出现问题。

您可以看到针对此特定问题的一些JIRA问题:

https://issues.apache.org/jira/browse/HADOOP-7682

https://issues.apache.org/jira/browse/HADOOP-8089

唯一的解决方法是使用此补丁修补Hadoop:

https://github.com/congainc/patch-hadoop_7682-1.0.x-win

或升级到本机在Windows上运行的Hadoop 2.2。

答案 1 :(得分:-1)

看来问题是

Failed to set permissions of path: \tmp\hadoop-hp\mapred\staging\hp1776229724.staging to 0700

检查运行代码的用户是否对堆栈跟踪中提到的目录拥有足够的权限。

还有踪迹

Unable to load native-hadoop library for your platform...

真的让我担心没有什么可以运行得很好^^