将现有向量转换为Mahout向量

时间:2012-08-21 21:35:08

标签: hadoop mahout lda sequencefile

我试图将术语 - 频率值转换为mahout矢量表示,这样我就可以在给定的矢量上使用LDA。我正在关注mahout wiki,其中代码片段建议如何将exisitng矢量转换为Mahout矢量。

https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html

这是我的代码,我在创建VectorWriter的地方遇到了NullPointerException。 apache cwiki建议使用,

VectorWriter vectorWriter = SequenceFile.createWriter(filesystem, configuration, outfile, LongWritable.class, SparseVector.class);

但是,我没有在org.apache.hadoop.io.SequenceFile中看到SequenceFile.createWriter;

这是完整的代码段。

        fs = FileSystem.get(conf);
        //I"m using SeqeunceFile.Writer because SequenceFile.createWriter is not available.
        VectorWriter vectorWriter = (VectorWriter) new SequenceFile.Writer(fs, conf, path, LongWritable.class, RandomAccessSparseVector.class);

        ArrayList<Vector> weights = new ArrayList<Vector>();
        BufferedReader buffer = new BufferedReader(new FileReader("/home/hadoop/LDATest/LDAData/test"));
        String line = null;

        while((line = buffer.readLine()) != null)
        {    
            String[] data = line.split(" "); // split the term,weight data
            Vector weightVector = new RandomAccessSparseVector(1,1);
            weightVector.setQuick(0, Double.parseDouble(data[1])); // add the weight
            weights.add(weightVector);
        }


        vectorWriter.write(new VectorIterable(weights));

这是错误,

线程“main”java.lang.NullPointerException中的异常     在org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73)     在org.apache.hadoop.io.SequenceFile $ Writer.init(SequenceFile.java:910)     在org.apache.hadoop.io.SequenceFile $ Writer。(SequenceFile.java:843)     在org.apache.hadoop.io.SequenceFile $ Writer。(SequenceFile.java:831)     在org.apache.hadoop.io.SequenceFile $ Writer。(SequenceFile.java:823)     at kbsi.ideal.LDATest.iterableTest(LDATest.java:161)     在kbsi.ideal.LDATest.main(LDATest.java:194)

我真的很感谢你对此的帮助。谢谢

0 个答案:

没有答案