mahout矢量创建使用Map Reduce

时间:2013-04-23 10:49:14

标签: hadoop mapreduce vectorization mahout

我从格式为:

的文本文档中实现了map reduce作业
 - id,val1,val2..valn
 - 0,1,2,3,4
 - 1,5,6,7,8
 - 2,9,10,11,12
 - 3,3,8,5,2
 - 4,4,89,84,1

我使用NamedVecTor将我的每个向量与他的id相关联,这是返回

- 0:{0:1.0,1:2.0,2:3.0,3:4}
 - 1:{0:5.0,1:6.0,2:7.0,3:8}
 - 2:{0:9.0,1:10.0,2:11.0,3:12}
 - 3:{0:3.0,1:8.0,2:5.0,3:2}
 - 4:{0:4.0,1:89.0,2:84.0,3:1}

这是我用于reduce

的代码
public class Reduce extends MapReduceBase implements
    Reducer<LongWritable, Text, VectorWritable, Text> { 

    public void reduce(LongWritable key, Iterator<Text> values,
            OutputCollector<VectorWritable, Text> output, Reporter reporter)
            throws IOException {        

        CSVParser parsert = new CSVParser();
        String[] line = parsert.parseLine(values.next().toString());

        DenseVector vector = new DenseVector(line.length);
        for (int i = 0; i < line.length; i++) {
            String strValue = line[i];
            vector.setQuick(i, Double.parseDouble(strValue);
        }

        System.out.print("\n vec " + key + "\n");
        System.out.print(vector);

        output.collect(new VectorWritable(new NamedVector(vector, key.toString())), new Text(""));
    }
}

之后我尝试使用kmeans但我有一个错误:

mahout kmeans -i /user/dalisama/output/clusters-1/part-r-00000-o output -c clusters -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cd 1 -k 2

我知道我错过了一些明显的东西? 这是控制台输出

    dalisama@ubuntu:~$ mahout kmeans -i /user/dalisama/testdata -o output -c clusters -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cd 1 -k 2

Running on hadoop, using /home/dalisama/hadoop-1.1.2/bin//hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/dalisama/mahout/examples/target/mahout-examples-0.7-job.jar
13/04/23 14:25:33 INFO common.AbstractJob: Command line arguments: {--clusters=[clusters], --convergenceDelta=[1], --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], --endPhase=[2147483647], --input=[/user/dalisama/testdata], --maxIter=[5], --method=[mapreduce], --numClusters=[2], --output=[output], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
13/04/23 14:25:34 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/04/23 14:25:34 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/04/23 14:25:34 INFO compress.CodecPool: Got brand-new compressor
Exception in thread "main" java.lang.IllegalStateException: hdfs://localhost:9000/user/dalisama/testdata
    at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:63)
    at org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:89)
    at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:95)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:49)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.io.IOException: hdfs://localhost:9000/user/dalisama/testdata not a SequenceFile
    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1517)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1490)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
    at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.<init>(SequenceFileIterator.java:58)
    at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:61)
    ... 16 more

0 个答案:

没有答案