如何强制直接从HDFS读取Apache Mahout应用程序

时间:2013-02-01 10:50:23

标签: hdfs mahout

我已经实现了一个Apache Mahout应用程序(附带下文),它执行一些基本的计算。为此,需要从本地计算机加载数据集。此应用程序以jar文件的形式出现,但随后在hadoop伪分布式集群中执行。终端命令是:$ hadoop jar /home/eualin/ApacheMahout/tdunning-MiA-5b8956f/target/mia-0.1-jar-with-dependencies.jar mia.recommender.ch03.IREvaluatorBooleanPrefIntro2“/ home / eualin / Desktop /链接决赛“

现在,我的问题是如何做同样的事情,但这次是通过从HDFS读取数据集(当然,我们假设数据集已经存储在HDFS中,例如/ user / eualin / output / links -final}。在这种情况下应该改变什么?这可能会有所帮助:hdfs:// localhost:50010 / user / eualin / output / links-final

package mia.recommender.ch03;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.eval.DataModelBuilder;
import org.apache.mahout.cf.taste.eval.IRStatistics;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.common.FastByIDMap;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.model.GenericBooleanPrefDataModel;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericBooleanPrefUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.LogLikelihoodSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.model.PreferenceArray;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;

import java.io.File;

public class IREvaluatorBooleanPrefIntro2 {
    private IREvaluatorBooleanPrefIntro2() {
    }
    public static void main(String[] args) throws Exception {
        if (args.length != 1)                 {
            System.out.println("give file's HDFS path");
            System.exit(1);
        }
        DataModel model = new GenericBooleanPrefDataModel(
                GenericBooleanPrefDataModel.toDataMap(
                        new GenericBooleanPrefDataModel(new FileDataModel(new File(args[0])))));
        RecommenderIRStatsEvaluator evaluator =
                new GenericRecommenderIRStatsEvaluator();
        RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
            @Override
            public Recommender buildRecommender(DataModel model) throws TasteException {
                UserSimilarity similarity = new LogLikelihoodSimilarity(model);
                UserNeighborhood neighborhood =
                        new NearestNUserNeighborhood(10, similarity, model);
                return new GenericBooleanPrefUserBasedRecommender(model, neighborhood, similarity);
            }
        };
        DataModelBuilder modelBuilder = new DataModelBuilder() {
            @Override
            public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingData) {
                return new GenericBooleanPrefDataModel(
                        GenericBooleanPrefDataModel.toDataMap(trainingData));
            }
        };
        IRStatistics stats = evaluator.evaluate(
                recommenderBuilder, modelBuilder, model, null, 10,
                GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD,
                1.0);
        System.out.println(stats.getPrecision());
        System.out.println(stats.getRecall());
    }
}

1 个答案:

答案 0 :(得分:0)

您不能直接,因为非分布式代码不了解HDFS。而是将文件复制到setup()中的本地位置,然后从本地文件中读取它。