如何使用HDInsight .NET SDK提交mahout推荐作业

时间:2014-03-04 12:14:55

标签: hadoop mahout hdinsight

我是HDInsight的新手。我想学习和练习机器学习,HDInsight正是我想要的,但似乎没有直接的API来mahout。由于mahout推荐基本上会转换为mapredure作业,因此我在Windows Azure文档中遵循了一些mapreduce示例并编写以下代码:

// Define the MapReduce job
MapReduceJobCreateParameters mrJobDefinition = new MapReduceJobCreateParameters()
{
    JarFile = "wasb:///example/jars/mahout-core-0.9-job.jar",
    ClassName = "org.apache.mahout.cf.taste.hadoop.item.RecommenderJob",
};

mrJobDefinition.Arguments.Add(" -s SIMILARITY_COOCCURRENCE");
mrJobDefinition.Arguments.Add(" --input=/reply");
mrJobDefinition.Arguments.Add(" --output=/recommend/");
mrJobDefinition.Arguments.Add(" --usersFile=/data/users.txt");

我已经上传了" mahout-core-0.9-job.jar"到指定的Azure blob存储容器中的/ example / jars。

但是我收到以下错误消息:

  

14/04/03 12:04:28错误 security.UserGroupInformation PriviledgedActionException as:johnny cause: java.io.IOException :异常阅读文件:/ c:/ apps / temp / hdfs / mapred / local / taskTracker / johnny / jobcache / job_201404031203_0001 / jobToken =   java.security.PrivilegedActionException:java.io.IOException:异常读取文件:/ c:/ apps / temp / hdfs / mapred / local / taskTracker / johnny / jobcache / job_201404031203_0001 / jobToken =       at java.security.AccessController.doPrivileged(Native Method)       在javax.security.auth.Subject.doAs(Subject.java:415)       在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1233)       在org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:951)       在org.apache.hadoop.mapreduce.Job.submit(Job.java:550)       在org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)       在org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:77)       在org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)       在org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:164)       在org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)       在org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:322)       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)       在java.lang.reflect.Method.invoke(Method.java:601)       在org.apache.hadoop.util.RunJar.main(RunJar.java:160)   引起:java.io.IOException:异常读取文件:/ c:/ apps / temp / hdfs / mapred / local / taskTracker / johnny / jobcache / job_201404031203_0001 / jobToken =       在org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:136)       在org.apache.hadoop.mapred.JobClient.readTokensFromFiles(JobClient.java:2149)       在org.apache.hadoop.mapred.JobClient.populateTokenCache(JobClient.java:2185)       在org.apache.hadoop.mapred.JobClient.access $ 300(JobClient.java:179)       在org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:964)       在org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:951)       ......还有16个   引起: java.io.FileNotFoundException:文件文件:/ c:/ apps / temp / hdfs / mapred / local / taskTracker / johnny / jobcache / job_201404031203_0001 / jobToken =不存在。       在org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:427)       在org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:254)       在org.apache.hadoop.fs.ChecksumFileSystem $ ChecksumFSInputChecker。(ChecksumFileSystem.java:125)       在org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)       在org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436)       在org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:130)       ......还有21个   线程" main"中的例外情况java.io.IOException:异常读取文件:/ c:/ apps / temp / hdfs / mapred / local / taskTracker / johnny / jobcache / job_201404031203_0001 / jobToken =       在org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:136)       在org.apache.hadoop.mapred.JobClient.readTokensFromFiles(JobClient.java:2149)       在org.apache.hadoop.mapred.JobClient.populateTokenCache(JobClient.java:2185)       在org.apache.hadoop.mapred.JobClient.access $ 300(JobClient.java:179)       在org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:964)       在org.apache.hadoop.mapred.JobClient $ 2.run(JobClient.java:951)       at java.security.AccessController.doPrivileged(Native Method)       在javax.security.auth.Subject.doAs(Subject.java:415)       在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1233)       在org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:951)       在org.apache.hadoop.mapreduce.Job.submit(Job.java:550)       在org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)       在org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob.run(PreparePreferenceMatrixJob.java:77)       在org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)       在org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.run(RecommenderJob.java:164)       在org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)       在org.apache.mahout.cf.taste.hadoop.item.RecommenderJob.main(RecommenderJob.java:322)       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)       在java.lang.reflect.Method.invoke(Method.java:601)       在org.apache.hadoop.util.RunJar.main(RunJar.java:160)   引起: java.io.FileNotFoundException:文件文件:/ c:/ apps / temp / hdfs / mapred / local / taskTracker / johnny / jobcache / job_201404031203_0001 / jobToken =不存在。       在org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:427)       在org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:254)       在org.apache.hadoop.fs.ChecksumFileSystem $ ChecksumFSInputChecker。(ChecksumFileSystem.java:125)       在org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)       在org.apache.hadoop.fs.FileSystem.open(FileSystem.java:436)       在org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:130)       ......还有21个   强行关闭观察者/保持活跃的线程池   Templeton:退出代码1的工作失败

在网上搜索后,似乎应该对mapred-site.xml或其他hadoop配置文件进行一些更改。但我对Apache hadoop完全不了解,并且对此并不了解Linux和Java。

任何帮助或指示都会非常感激。

1 个答案:

答案 0 :(得分:0)

使用最新的.NET SDK for Hadoop(http://hadoopsdk.codeplex.com/),我可以成功提交具有相同代码的mahout作业。这个问题似乎已经被SDK解决了。