Hadoop HDInsight .NET SDK API提交作业

时间:2013-10-23 05:41:40

标签: hadoop hdinsight

我正在使用HDInsight .NET Hadoop API在asp.net应用程序中提交Map Reduce作业。

使用Microsoft.Hadoop.Mapreduce;

var hadoop = Hadoop.Connect();

var result = hadoop.MapReduceJob.ExecuteJob();

//也尝试了这个,但同样的异常

// var result = hadoop.MapReduceJob.ExecuteJob(config);

ExecuteJob()调用失败并在运行时抛出异常。这个世界上的任何人都能成功地运行这个电话。是否可以通过添加更多输入参数或对象来自定义Map()函数(除了Microsoft在MapperBase类中给出的除外)? Mapper和Reducer方法中的逻辑可以访问缓存/数据库吗?

1 个答案:

答案 0 :(得分:1)

使用HDInsight .NET SDK提交MapReduce作业的示例在此处发布:

http://www.windowsazure.com/en-us/manage/services/hdinsight/submit-hadoop-jobs-programmatically/#mapreduce-sdk

// Define the MapReduce job
MapReduceJobCreateParameters mrJobDefinition = new MapReduceJobCreateParameters()
{
    JarFile = "wasb:///example/jars/hadoop-examples.jar",
    ClassName = "wordcount"
};

mrJobDefinition.Arguments.Add("wasb:///example/data/gutenberg/davinci.txt");
mrJobDefinition.Arguments.Add("wasb:///example/data/WordCountOutput");

// Get the certificate object from certificate store using the friendly name to identify it
X509Store store = new X509Store();
store.Open(OpenFlags.ReadOnly);
X509Certificate2 cert = store.Certificates.Cast<X509Certificate2>().First(item => item.FriendlyName == certfrientlyname);
JobSubmissionCertificateCredential creds = new JobSubmissionCertificateCredential(new Guid(subscriptionID), cert, clusterName);

// Create a hadoop client to connect to HDInsight
var jobClient = JobSubmissionClientFactory.Connect(creds);

// Run the MapReduce job
JobCreationResults mrJobResults = jobClient.CreateMapReduceJob(mrJobDefinition);

// Wait for the job to complete
WaitForJobCompletion(mrJobResults, jobClient);