无法从ADF执行U-SQL脚本:ScriptPath不应为null

时间:2017-12-27 16:38:02

标签: azure azure-data-factory u-sql custom-activity

我们正在尝试从ADFv2运行U-SQL脚本,但是存在阻止执行的错误。考虑到关于这个问题的MS文档没有帮助,我再次请求帮助解决这个问题。

  1. 最初的问题是如何将U-SQL脚本部署到ADLA。我们找不到任何有用的东西,最后只是将脚本复制粘贴到ADLA和Azure Blob存储上,格式为:.usql和.txt。 (这也可能是其中一个问题)。

  2. 我们根据MS documentation创建了ARM,但它失败并出现错误:ScriptPath should not be null这是奇怪的,因为即使在链接服务和活动上也已经指定了此值

  3. 以下是我们创建的LS和活动:

    LinkedService:

    {
        "type": "linkedservices",
        "name": "LinkedServiceofAzureBlobStorageforscriptPath",
        "dependsOn": ["[parameters('dataFactoryName')]"],
        "apiVersion": "[variables('apiVersion')]",
        "properties": {
            "type": "AzureStorage",
            "typeProperties": {
                "connectionString": {
                    "type": "SecureString",
                    "value": "DefaultEndpointsProtocol=https;AccountName=<account>;AccountKey=<key>;EndpointSuffix=core.windows.net"
                }
                "scriptPath": "container\\script.txt"
                //"scriptPath": "https://storage.blob.core.windows.net/container/script.txt"//"wasb://container@storage/script.txt",
            }
        }
    }
    

    的活动:

    {
                "type": "DataLakeAnalyticsU-SQL",
                "typeProperties": {
                    //"script": "master.dbo.sp_test()",
                    "scriptPath": "container\\script.txt"//"scriptPath": "https://storage.blob.core.windows.net/container/script.txt"//"wasb://container@storage/script.txt",
                    "scriptLinkedService": {
                        "referenceName": "LinkedServiceofAzureBlobStorageforscriptPath",
                        "type": "LinkedServiceReference"
                    },
                    "degreeOfParallelism": 3,
                    "priority": 100
                },
                "name": "CopyFromAzureBlobToAzureSQL",
                "description": "Copy data frm Azure blob to Azure SQL",
                "linkedServiceName": {
                    "referenceName": "AzureDataLakeAnalyticsLinkedService",
                    "type": "LinkedServiceReference"
                }
            }
    

    还试过this approach,但仍然没有成功。

    这是我们正在测试的虚拟脚本:

    @a =
        SELECT *
        FROM(
            VALUES
            (
                "Contoso",
                1500.0
            ),
            (
                "Woodgrove",
                2700.0
            ),
            (
                "Contoso",
                1500.0
            ),
            (
                "Woodgrove",
                2700.0
            ),
            (
                "Contoso",
                1500.0
            ),
            (
                "Woodgrove",
                2700.0
            ),
            (
                "Contoso",
                1500.0
            ),
            (
                "Woodgrove",
                2700.0
            ),
            (
                "Contoso",
                1500.0
            ),
            (
                "Woodgrove",
                2700.0
            ),
            (
                "Contoso",
                1500.0
            ),
            (
                "Woodgrove",
                2700.0
            )
            ) AS 
                  D( customer, amount );
    OUTPUT @a
    TO "/data"+DateTime.Now.ToString("yyyyMMddHHmmss")+".csv"
    USING Outputters.Csv();
    

    但如果你能指出一些更复杂的例子,并且脚本中有一些代码,那就太棒了。

    谢谢!

    更新26.01.2018

    在咨询MS关于usql的部署之后,我们提供了powershell命令的组合:

    • 我们执行一个脚本,该脚本在位于Datalake上的文件夹上的u-sql程序集的bin中上载.dll;
    • 然后遍历目录并分别创建每个程序集;
    • 对于u-sql脚本,我们将它们创建为Datalake Analytics上的存储过程,并上传一个简单的u-sql脚本,该脚本使用所需的参数执行这些过程;

1 个答案:

答案 0 :(得分:1)

您不需要链接服务中的脚本路径。

blob链接服务应该是:

{
    "name": "Blob Name",
    "properties": {
        "type": "AzureStorage",
        "typeProperties": {
            "connectionString": {
                "type": "SecureString",
                "value": "DefaultEndpointsProtocol=https;AccountName=etc"
            }
        },
        "connectVia": {
            "referenceName": "Your IR Ref",
            "type": "IntegrationRuntimeReference"
        }
    }
}

然后在活动中使用容器和文件名引用脚本,如下所示,链接服务的引用名称。

    "activities": [
        {
            "name": "U-SQL1",
            "type": "DataLakeAnalyticsU-SQL",
            "policy": {
                "timeout": "7.00:00:00",
                "retry": 0,
                "retryIntervalInSeconds": 20
            },
            "typeProperties": {
                "scriptPath": "u-sql1/Test",
                "degreeOfParallelism": {
                    "value": "5",
                    "type": "Expression"
                },
                "priority": 1,
                "compilationMode": "Full",
                "scriptLinkedService": {
                    "referenceName": "Your Blob Ref",
                    "type": "LinkedServiceReference"
                }
            },
            "linkedServiceName": {
                "referenceName": "Your ADLa Ref",
                "type": "LinkedServiceReference"
            }
        },

有关信息,我已经忽略了MS文档并使用新的dev UI创建了此JSON,因为我有私有预览访问权限。以上内容已经过测试,可以在我的blob帖子中使用它:

https://mrpaulandrew.com/2017/12/20/controlling-u-sql-job-aus-with-azure-data-factory-v2-pipeline-parameters/

希望这有帮助。