我正在尝试使用Azure数据工厂使用Hdi版本3.5创建随需应变的HD insight Spark群集。数据工厂拒绝使用错误消息创建
HdiVersion:不支持“3.5”
如果目前无法创建随需应变的HD洞察群集,那么另一个明智的选择是什么?对我来说,为什么微软没有向Azure数据工厂添加随需应变HD洞察Spark集群,这似乎很奇怪。
答案 0 :(得分:1)
这是一个完整的解决方案,它使用ADF在C#中调度自定义.NET活动,然后使用ARM模板和SSH.NET来执行运行R脚本的命令。
因此,ADF用于计划.NET活动,批处理服务用于运行dll中的代码,然后HDInsight集群的json模板文件存储在blob中,可以根据需要进行配置。
完整的描述在文章“Automating Azure: Creating an On-Demand HDInsight Cluster”中,但这里是C#代码,它是自动化的本质(其他一切只是管理工作来设置位):
using System;
using System.Collections.Generic;
using Microsoft.Azure.Management.DataFactories.Models;
using Microsoft.Azure.Management.DataFactories.Runtime;
using Microsoft.Azure.Management.ResourceManager.Fluent;
using Microsoft.Azure.Management.ResourceManager.Fluent.Core;
using Renci.SshNet;
namespace VM
{
public class StartVM : IDotNetActivity
{
private IActivityLogger _logger;
public IDictionary<string, string> Execute(
IEnumerable<LinkedService> linkedServices,
IEnumerable<Dataset> datasets,
Activity activity,
IActivityLogger logger)
{
_logger = logger;
_logger.Write("Starting execution...");
var credentials = SdkContext.AzureCredentialsFactory.FromServicePrincipal(
"" // enter clientId here, this is the ApplicationID
, "" // this is the Application secret key
, "" // this is the tenant id
, AzureEnvironment.AzureGlobalCloud);
var azure = Microsoft.Azure.Management.Fluent.Azure
.Configure()
.WithLogLevel(HttpLoggingDelegatingHandler.Level.Basic)
.Authenticate(credentials)
.WithDefaultSubscription();
var groupName = "myResourceGroup";
var location = Region.EuropeNorth;
// create the resource group
var resourceGroup = azure.ResourceGroups.Define(groupName)
.WithRegion(location)
.Create();
// deploy the template
var templatePath = "https://myblob.blob.core.windows.net/blobcontainer/myHDI_template.JSON";
var paramPath = "https:// myblob.blob.core.windows.net/blobcontainer /myHDI_parameters.JSON";
var deployment = azure.Deployments.Define("myDeployment")
.WithExistingResourceGroup(groupName)
.WithTemplateLink(templatePath, "0.9.0.0") // make sure it matches the file
.WithParametersLink(paramPath, "1.0.0.0") // make sure it matches the file
.WithMode(Microsoft.Azure.Management.ResourceManager.Fluent.Models.DeploymentMode.Incremental)
.Create();
_logger.Write("The cluster is ready...");
executeSSHCommand();
_logger.Write("The SSH command was executed...");
_logger.Write("Deleting the cluster...");
// delete the resource group
azure.ResourceGroups.DeleteByName(groupName);
return new Dictionary<string, string>();
}
private void executeSSHCommand()
{
ConnectionInfo ConnNfo = new ConnectionInfo("myhdi-ssh.azurehdinsight.net", "sshuser",
new AuthenticationMethod[]{
// Pasword based Authentication
new PasswordAuthenticationMethod("sshuser","Addso@1234523123"),
}
);
// Execute a (SHELL) Command - prepare upload directory
using (var sshclient = new SshClient(ConnNfo))
{
sshclient.Connect();
using (var cmd = sshclient.CreateCommand(
"hdfs dfs -copyToLocal \"wasbs:///rscript/test.R\";env -i R CMD BATCH --no-save --no-restore \"test.R\"; hdfs dfs -copyFromLocal -f \"test-output.txt\" \"wasbs:///rscript/test-output.txt\" "))
{
cmd.Execute();
}
sshclient.Disconnect();
}
}
}
}
祝你好运!
费奥多尔
答案 1 :(得分:0)
我担心按需Spark当前不受支持,但它肯定在路线图中。请继续关注。
作为现在的解决方法,您可以尝试使用ADF CustomActivity使用自定义代码创建/删除Spark群集。
答案 2 :(得分:0)
Azure目前不支持为Spark活动创建On Demand HDInsight群集。既然你要求解决方法,我就是这样做的:
我知道很多简单任务的工作,但现在有效。