Question

我目前正在开发一个在Azure上创建大数据架构的项目。为了理解Azure的工作原理，我创建了一个数据工厂和Blob存储，并在按需HDInsight系统上为字数Hadoop进程设置了一个管道。

这是管道的JSON文件：

{
 "name": "MRSamplePipeline5",
    "properties": {
        "description": "Sample Pipeline to Run the Word Count Program",
        "activities": [
            {
                "type": "HDInsightMapReduce",
                "typeProperties": {
                    "className": "wordcount",
                    "jarFilePath": "executables/hadoop-example.jar",
                    "jarLinkedService": "AzureStorageLinkedService",
                    "arguments": [
                        "/davinci.txt",
                        "/WordCountOutput1"
                    ]
                },
                "outputs": [
                    {
                        "name": "MROutput4"
                    }
                ],
                "policy": {
                    "timeout": "01:00:00",
                    "concurrency": 1,
                    "retry": 3
                },
                "scheduler": {
                    "frequency": "Minute",
                    "interval": 15
                },
                "name": "MRActivity",
                "linkedServiceName": "HDInsightOnDemandLinkedService"
            }
        ],
        "start": "2017-07-24T00:00:00Z",
        "end": "2017-07-24T00:00:00Z",
        "isPaused": false,
        "hubName": "testazuredatafact_hub",
        "pipelineMode": "OneTime",
        "expirationTime": "3.00:00:00"
    }
}

它确实有效，即使输出是一个名为＆＃34; WordCountOutput1 / part-r-00000＆＃34;的文件。

我的问题是：如何将输入文件（davinci.txt）和输出文件（Output1）定义在我的blob存储的不同容器（例如＆＃34; exampledata＆＃34;）中？

Answer 1

Hadoop文件路径可以用完整的URI语法指定，包括方案和权限，指向不同类型的文件系统（例如HDFS与Azure与S3），在特定情况下，指向不同的Azure存储容器。 Azure存储访问的相关方案是“wasb”。权限包含容器和帐户。例如，请考虑以下hadoop fs -ls命令。

# WASB backed by container "test" in Azure Storage account "cnauroth"
hadoop fs -ls wasb://test@cnauroth.blob.core.windows.net/users/cnauroth

# WASB backed by container "qa" in Azure Storage account "cnauroth"
hadoop fs -ls wasb://qa@cnauroth.blob.core.windows.net/users/cnauroth

# WASB backed by container "production" in Azure Storage account "cnauroth-live"
hadoop fs -ls wasb://production@cnauroth-live.blob.core.windows.net/users/cnauroth

从同一客户端主机执行的每个命令都列出了一个不同的Azure存储帐户/容器。

将参数传递给作业提交时，可以使用相同的URI语法。

Azure上的Hadoop，我可以为I / O使用不同的Blob存储容器吗？

1 个答案: