我有一个类,可以进行一些提取,将负载转换为位于不同JSON文件中的数据集。
此过程正常。但是,我有必要每月手动处理。我在intelliJ中提交了一个spark应用程序(并在转换后提交了Scalla Singleton对象)
因此,我正在尝试使此过程自动化。但是,我找不到文档或教程来知道什么是实现此目标的最佳服务。
过程应:
我已经搜索过,但是找到的链接(寻找“按需创建HD见解火花集群”)如下:
我搜索过的其他选项:
谢谢!
答案 0 :(得分:0)
这是您想要的过程
使用Power Shell,应该很容易创建HDInsight集群,这是示例代码:
### Create a Spark 2.3 cluster in Azure HDInsight
# Default cluster size (# of worker nodes), version, and type
$clusterSizeInNodes = "1"
$clusterVersion = "3.6"
$clusterType = "Spark"
# Create the resource group
$resourceGroupName = Read-Host -Prompt "Enter the resource group name"
$location = Read-Host -Prompt "Enter the Azure region to create resources in, such as 'Central US'"
$defaultStorageAccountName = Read-Host -Prompt "Enter the default storage account name"
New-AzResourceGroup -Name $resourceGroupName -Location $location
# Create an Azure storage account and container
# Note: Storage account kind BlobStorage can only be used as secondary storage for HDInsight clusters.
New-AzStorageAccount `
-ResourceGroupName $resourceGroupName `
-Name $defaultStorageAccountName `
-Location $location `
-SkuName Standard_LRS `
-Kind StorageV2 `
-EnableHttpsTrafficOnly 1
$defaultStorageAccountKey = (Get-AzStorageAccountKey `
-ResourceGroupName $resourceGroupName `
-Name $defaultStorageAccountName)[0].Value
$defaultStorageContext = New-AzStorageContext `
-StorageAccountName $defaultStorageAccountName `
-StorageAccountKey $defaultStorageAccountKey
# Create a Spark 2.3 cluster
$clusterName = Read-Host -Prompt "Enter the name of the HDInsight cluster"
# Cluster login is used to secure HTTPS services hosted on the cluster
$httpCredential = Get-Credential -Message "Enter Cluster login credentials" -UserName "admin"
# SSH user is used to remotely connect to the cluster using SSH clients
$sshCredentials = Get-Credential -Message "Enter SSH user credentials" -UserName "sshuser"
# Set the storage container name to the cluster name
$defaultBlobContainerName = $clusterName
# Create a blob container. This holds the default data store for the cluster.
New-AzStorageContainer `
-Name $clusterName `
-Context $defaultStorageContext
$sparkConfig = New-Object "System.Collections.Generic.Dictionary``2[System.String,System.String]"
$sparkConfig.Add("spark", "2.3")
# Create the HDInsight cluster
New-AzHDInsightCluster `
-ResourceGroupName $resourceGroupName `
-ClusterName $clusterName `
-Location $location `
-ClusterSizeInNodes $clusterSizeInNodes `
-ClusterType $clusterType `
-OSType "Linux" `
-Version $clusterVersion `
-ComponentVersion $sparkConfig `
-HttpCredential $httpCredential `
-DefaultStorageAccountName "$defaultStorageAccountName.blob.core.windows.net" `
-DefaultStorageAccountKey $defaultStorageAccountKey `
-DefaultStorageContainer $clusterName `
-SshCredential $sshCredentials
Get-AzHDInsightCluster `
-ResourceGroupName $resourceGroupName `
-ClusterName $clusterName
您可以引用此链接将应用程序作业远程提交到Spark集群:
清理集群,可以使用powershell来实现,这是相同的示例代码;
# Removes the specified HDInsight cluster from the current subscription.
Remove-AzHDInsightCluster `
-ResourceGroupName $resourceGroupName `
-ClusterName $clusterName
# Removes the specified storage container.
Remove-AzStorageContainer `
-Name $clusterName `
-Context $defaultStorageContext
# Removes a Storage account from Azure.
Remove-AzStorageAccount `
-ResourceGroupName $resourceGroupName `
-Name $defaultStorageAccountName
# Removes a resource group.
Remove-AzResourceGroup `
-Name $resourceGroupName
其他参考:
https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-jupyter-spark-sql-use-powershell
希望有帮助。