我从格式为:
的文本文档中实现了map reduce作业 - id,val1,val2..valn
- 0,1,2,3,4
- 1,5,6,7,8
- 2,9,10,11,12
- 3,3,8,5,2
- 4,4,89,84,1
我使用NamedVecTor
将我的每个向量与他的id相关联,这是返回
- 0:{0:1.0,1:2.0,2:3.0,3:4}
- 1:{0:5.0,1:6.0,2:7.0,3:8}
- 2:{0:9.0,1:10.0,2:11.0,3:12}
- 3:{0:3.0,1:8.0,2:5.0,3:2}
- 4:{0:4.0,1:89.0,2:84.0,3:1}
这是我用于reduce
的代码public class Reduce extends MapReduceBase implements
Reducer<LongWritable, Text, VectorWritable, Text> {
public void reduce(LongWritable key, Iterator<Text> values,
OutputCollector<VectorWritable, Text> output, Reporter reporter)
throws IOException {
CSVParser parsert = new CSVParser();
String[] line = parsert.parseLine(values.next().toString());
DenseVector vector = new DenseVector(line.length);
for (int i = 0; i < line.length; i++) {
String strValue = line[i];
vector.setQuick(i, Double.parseDouble(strValue);
}
System.out.print("\n vec " + key + "\n");
System.out.print(vector);
output.collect(new VectorWritable(new NamedVector(vector, key.toString())), new Text(""));
}
}
之后我尝试使用kmeans但我有一个错误:
mahout kmeans -i /user/dalisama/output/clusters-1/part-r-00000-o output -c clusters -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cd 1 -k 2
我知道我错过了一些明显的东西? 这是控制台输出
dalisama@ubuntu:~$ mahout kmeans -i /user/dalisama/testdata -o output -c clusters -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cd 1 -k 2
Running on hadoop, using /home/dalisama/hadoop-1.1.2/bin//hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB: /home/dalisama/mahout/examples/target/mahout-examples-0.7-job.jar
13/04/23 14:25:33 INFO common.AbstractJob: Command line arguments: {--clusters=[clusters], --convergenceDelta=[1], --distanceMeasure=[org.apache.mahout.common.distance.CosineDistanceMeasure], --endPhase=[2147483647], --input=[/user/dalisama/testdata], --maxIter=[5], --method=[mapreduce], --numClusters=[2], --output=[output], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
13/04/23 14:25:34 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/04/23 14:25:34 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/04/23 14:25:34 INFO compress.CodecPool: Got brand-new compressor
Exception in thread "main" java.lang.IllegalStateException: hdfs://localhost:9000/user/dalisama/testdata
at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:63)
at org.apache.mahout.clustering.kmeans.RandomSeedGenerator.buildRandom(RandomSeedGenerator.java:89)
at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:95)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.io.IOException: hdfs://localhost:9000/user/dalisama/testdata not a SequenceFile
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1517)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1490)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1479)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1474)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.<init>(SequenceFileIterator.java:58)
at org.apache.mahout.common.iterator.sequencefile.SequenceFileIterable.iterator(SequenceFileIterable.java:61)
... 16 more