Question

我已经安装了HDInsight。同时，我使用支持Python 3的PySpark创建了一些文件。

我打算通过REST API调用此Python笔记本，而Livy Server似乎是前进之路。

面临的问题是在Livy Server中，暴露Python Notebook无法正常工作。

是否可以通过Livy API从外部调用Python笔记本？

Answer 1

如果我正确理解了您的问题，那么您可以：

HDInsight中正在运行的Spark集群
您在本地计算机或虚拟机上拥有的python笔记本（我假设是Jupyter）。

如果是这样，您可以在本地计算机上设置 sparkmagic 并配置 sparkmagic 中的.config文件以连接到HDInsight Spark群集。 Install Jupyter notebook on your computer and connect to Apache Spark on HDInsight

sparkmagic 是livy客户端，可通过 Livy 与远程Spark集群进行交互。

Answer 2

不确定笔记本，但适用于Python的HDInsight SDK提供了类和方法，可用于管理HDInsight群集。它包括创建，删除，更新，列出，调整大小，执行脚本操作，监视，获取HDInsight群集属性的操作，等等。

相同的PIP包：

pip install azure-mgmt-hdinsight

首先需要通过Azure订阅对SDK进行身份验证。

登录：

from azure.mgmt.hdinsight import HDInsightManagementClient
from azure.common.credentials import ServicePrincipalCredentials
from azure.mgmt.hdinsight.models import *

# Tenant ID for your Azure Subscription
TENANT_ID = ''
# Your Service Principal App Client ID
CLIENT_ID = ''
# Your Service Principal Client Secret
CLIENT_SECRET = ''
# Your Azure Subscription ID
SUBSCRIPTION_ID = ''

credentials = ServicePrincipalCredentials(
    client_id = CLIENT_ID,
    secret = CLIENT_SECRET,
    tenant = TENANT_ID
)

client = HDInsightManagementClient(credentials, SUBSCRIPTION_ID)

HDInsight提供了一种称为脚本操作的配置方法，该方法调用自定义脚本以自定义群集。

script_action1 = RuntimeScriptAction(name="<Script Name>", uri="<URL To Script>", roles=[<List of Roles>]) #valid roles are "headnode", "workernode", "zookeepernode", and "edgenode"

client.clusters.execute_script_actions("<Resource Group Name>", "<Cluster Name>", <persist_on_success (bool)>, script_actions=[script_action1]) #add more RuntimeScriptActions to the list to execute multiple scripts

要列出指定集群的所有持久化脚本操作，请执行以下操作：

scripts_paged = client.script_actions.list_persisted_scripts(resource_group_name, cluster_name)
while True:
  try:
    for script in scripts_paged.advance_page():
      print(script)
  except StopIteration:
    break

看看是否有帮助。

如何通过Azure HDInsight上托管的REST API调用Python Jupyter Notebook？

2 个答案: