我在过去的几个小时里一直试图让CVB0Driver工作,经过多次试验和错误后我发现了以下错误,我无法弄清楚。 (使用mahout-integration 0.7)
java.lang.Error: Unresolved compilation problem:
at org.apache.mahout.math.function.Functions.mult(Functions.java:770)
at org.apache.mahout.clustering.lda.cvb.TopicModel.<init>(TopicModel.java:139)
at org.apache.mahout.clustering.lda.cvb.TopicModel.<init>(TopicModel.java:113)
at org.apache.mahout.clustering.lda.cvb.TopicModel.<init>(TopicModel.java:108)
at org.apache.mahout.clustering.lda.cvb.TopicModel.<init>(TopicModel.java:92)
at org.apache.mahout.clustering.lda.cvb.CachingCVB0Mapper.setup(CachingCVB0Mapper.java:103)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
以下是我使用的代码,因为我尚未开始工作,我不确定自己是否走在正确的道路上,所以请随时发表评论看到我犯的错误。
String [] args = {"-c","UTF-8","-i",input,"-o",output};
//create the seq file from the directory of text documents
ToolRunner.run(new SequenceFilesFromDirectory(),args);
//tokenize the documents
DocumentProcessor.tokenizeDocuments(new Path(inputDir), analyzer.getClass().asSubclass(Analyzer.class), tokenizedPath, conf);
//create tf vectors
DictionaryVectorizer.createTermFrequencyVectors(tokenizedPath,new Path(outputDir), DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER, conf, minSupport, maxNGramSize, minLLRValue, -1.0f, true, reduceTasks, chunkSize, sequentialAccessOutput, true);
//calculate the document frequencies
Pair<Long[], List<Path>> dfData = TFIDFConverter.calculateDF( new Path(outputDir, DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER), new Path(outputDir), conf, chunkSize);
//create tfidf vectors
TFIDFConverter.processTfIdf( new Path(outputDir , DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER), new Path(outputDir), conf, dfData, minDf, maxDFPercent, norm, true, sequentialAccessOutput, true, reduceTasks);
args = new String[]{"-i","tfidf-vectors/part-r-00000","-o","cvb"};
//create the matrix for cvb
RowIdJob.main(args);
CVB0Driver.run(conf, new Path("cvb/matrix"), mto, numTopics, numTerms, alpha, eta, maxIterations, iterationBlockSize, convergenceDelta, dictionaryPath, dto, msto, randomSeed, testFraction, numTrainThreads, numUpdateThreads, maxItersPerDoc, numReduceTasks, backfillPerplexity);
非常感谢任何帮助。
答案 0 :(得分:0)
好的,似乎这是maven / eclipse项目之间的一些冲突。
我最近将mahout-integration 0.7源码导入eclipse并以某种方式严重构建它,mahout-math存在问题,我的其他项目可能开始引用构建严重的jar,我对maven不太熟悉所以我不知道是不是这样,或者日食有点疯狂。
从eclipse中删除这个项目后,一切都运行良好。
这个问题有助于解决这个问题 - java-unresolved-compilation-problem