我试图将术语 - 频率值转换为mahout矢量表示,这样我就可以在给定的矢量上使用LDA。我正在关注mahout wiki,其中代码片段建议如何将exisitng矢量转换为Mahout矢量。
https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html
这是我的代码,我在创建VectorWriter的地方遇到了NullPointerException。 apache cwiki建议使用,
VectorWriter vectorWriter = SequenceFile.createWriter(filesystem, configuration, outfile, LongWritable.class, SparseVector.class);
但是,我没有在org.apache.hadoop.io.SequenceFile中看到SequenceFile.createWriter;
这是完整的代码段。
fs = FileSystem.get(conf);
//I"m using SeqeunceFile.Writer because SequenceFile.createWriter is not available.
VectorWriter vectorWriter = (VectorWriter) new SequenceFile.Writer(fs, conf, path, LongWritable.class, RandomAccessSparseVector.class);
ArrayList<Vector> weights = new ArrayList<Vector>();
BufferedReader buffer = new BufferedReader(new FileReader("/home/hadoop/LDATest/LDAData/test"));
String line = null;
while((line = buffer.readLine()) != null)
{
String[] data = line.split(" "); // split the term,weight data
Vector weightVector = new RandomAccessSparseVector(1,1);
weightVector.setQuick(0, Double.parseDouble(data[1])); // add the weight
weights.add(weightVector);
}
vectorWriter.write(new VectorIterable(weights));
这是错误,
线程“main”java.lang.NullPointerException中的异常 在org.apache.hadoop.io.serializer.SerializationFactory.getSerializer(SerializationFactory.java:73) 在org.apache.hadoop.io.SequenceFile $ Writer.init(SequenceFile.java:910) 在org.apache.hadoop.io.SequenceFile $ Writer。(SequenceFile.java:843) 在org.apache.hadoop.io.SequenceFile $ Writer。(SequenceFile.java:831) 在org.apache.hadoop.io.SequenceFile $ Writer。(SequenceFile.java:823) at kbsi.ideal.LDATest.iterableTest(LDATest.java:161) 在kbsi.ideal.LDATest.main(LDATest.java:194)
我真的很感谢你对此的帮助。谢谢