Azure数据工厂管道的每个活动均无法按顺序工作

时间:2019-07-30 08:46:39

标签: foreach pipeline azure-data-factory sequential

我有一个Azure数据工厂管道,通过该管道我需要从Blob存储容器中提取所有CSV文件并将其存储到Azure Data Lake容器中。在将这些文件存储到数据湖之前,我需要对该文件的数据进行一些数据操作。

现在,我需要按顺序而不是并行地执行此过程。因此,我使用ForEach Activity-> Settings-> Sequential。

但是它不能按顺序工作,而是作为并行过程工作。

Pipeline main activity panel

Pipeline foreach activity panel 下面是管道代码


{
    "name":"PN_obfuscate_and_move",
    "properties":{
        "description":"move PN blob csv to adlgen2(obfuscated)",
        "activities":[
            {
                "name":"GetBlobFileName",
                "type":"GetMetadata",
                "dependsOn":[

                ],
                "policy":{
                    "timeout":"7.00:00:00",
                    "retry":0,
                    "retryIntervalInSeconds":30,
                    "secureOutput":false,
                    "secureInput":false
                },
                "userProperties":[

                ],
                "typeProperties":{
                    "dataset":{
                        "referenceName":"PN_Getblobfilename_Dataset",
                        "type":"DatasetReference"
                    },
                    "fieldList":[
                        "childItems"
                    ],
                    "storeSettings":{
                        "type":"AzureBlobStorageReadSetting",
                        "recursive":true
                    },
                    "formatSettings":{
                        "type":"DelimitedTextReadSetting"
                    }
                }
            },
            {
                "name":"ForEachBlobFile",
                "type":"ForEach",
                "dependsOn":[
                    {
                        "activity":"GetBlobFileName",
                        "dependencyConditions":[
                            "Succeeded"
                        ]
                    }
                ],
                "userProperties":[

                ],
                "typeProperties":{
                    "items":{
                        "value":"@activity('GetBlobFileName').output.childItems",
                        "type":"Expression"
                    },
                    "isSequential":true,
                    "activities":[
                        {
                            "name":"Blob_to_SQLServer",
                            "description":"Copy PN blob files to sql server table",
                            "type":"Copy",
                            "dependsOn":[

                            ],
                            "policy":{
                                "timeout":"7.00:00:00",
                                "retry":0,
                                "retryIntervalInSeconds":30,
                                "secureOutput":false,
                                "secureInput":false
                            },
                            "userProperties":[
                                {
                                    "name":"Source",
                                    "value":"PNemailattachment//"
                                },
                                {
                                    "name":"Destination",
                                    "value":"[dbo].[PN]"
                                }
                            ],
                            "typeProperties":{
                                "source":{
                                    "type":"DelimitedTextSource",
                                    "storeSettings":{
                                        "type":"AzureBlobStorageReadSetting",
                                        "recursive":false,
                                        "wildcardFileName":"*.*",
                                        "enablePartitionDiscovery":false
                                    },
                                    "formatSettings":{
                                        "type":"DelimitedTextReadSetting"
                                    }
                                },
                                "sink":{
                                    "type":"AzureSqlSink"
                                },
                                "enableStaging":false
                            },
                            "inputs":[
                                {
                                    "referenceName":"PNBlob",
                                    "type":"DatasetReference"
                                }
                            ],
                            "outputs":[
                                {
                                    "referenceName":"PN_SQLServer",
                                    "type":"DatasetReference"
                                }
                            ]
                        },
                        {
                            "name":"Obfuscate_PN_SQLData",
                            "description":"mask specific columns",
                            "type":"SqlServerStoredProcedure",
                            "dependsOn":[
                                {
                                    "activity":"Blob_to_SQLServer",
                                    "dependencyConditions":[
                                        "Succeeded"
                                    ]
                                }
                            ],
                            "policy":{
                                "timeout":"7.00:00:00",
                                "retry":0,
                                "retryIntervalInSeconds":30,
                                "secureOutput":false,
                                "secureInput":false
                            },
                            "userProperties":[

                            ],
                            "typeProperties":{
                                "storedProcedureName":"[dbo].[Obfuscate_PN_Data]"
                            },
                            "linkedServiceName":{
                                "referenceName":"PN_SQLServer",
                                "type":"LinkedServiceReference"
                            }
                        },
                        {
                            "name":"SQLServer_to_ADLSGen2",
                            "description":"move PN obfuscated data to azure data lake gen2",
                            "type":"Copy",
                            "dependsOn":[
                                {
                                    "activity":"Obfuscate_PN_SQLData",
                                    "dependencyConditions":[
                                        "Succeeded"
                                    ]
                                }
                            ],
                            "policy":{
                                "timeout":"7.00:00:00",
                                "retry":0,
                                "retryIntervalInSeconds":30,
                                "secureOutput":false,
                                "secureInput":false
                            },
                            "userProperties":[

                            ],
                            "typeProperties":{
                                "source":{
                                    "type":"AzureSqlSource"
                                },
                                "sink":{
                                    "type":"DelimitedTextSink",
                                    "storeSettings":{
                                        "type":"AzureBlobFSWriteSetting"
                                    },
                                    "formatSettings":{
                                        "type":"DelimitedTextWriteSetting",
                                        "quoteAllText":true,
                                        "fileExtension":".csv"
                                    }
                                },
                                "enableStaging":false
                            },
                            "inputs":[
                                {
                                    "referenceName":"PN_SQLServer",
                                    "type":"DatasetReference"
                                }
                            ],
                            "outputs":[
                                {
                                    "referenceName":"PNADLSGen2",
                                    "type":"DatasetReference"
                                }
                            ]
                        },
                        {
                            "name":"Delete_PN_SQLData",
                            "description":"delete all data from table",
                            "type":"SqlServerStoredProcedure",
                            "dependsOn":[
                                {
                                    "activity":"SQLServer_to_ADLSGen2",
                                    "dependencyConditions":[
                                        "Succeeded"
                                    ]
                                }
                            ],
                            "policy":{
                                "timeout":"7.00:00:00",
                                "retry":0,
                                "retryIntervalInSeconds":30,
                                "secureOutput":false,
                                "secureInput":false
                            },
                            "userProperties":[

                            ],
                            "typeProperties":{
                                "storedProcedureName":"[dbo].[Delete_PN_Data]"
                            },
                            "linkedServiceName":{
                                "referenceName":"PN_SQLServer",
                                "type":"LinkedServiceReference"
                            }
                        }
                    ]
                }
            }
        ],
        "folder":{
            "name":"PN"
        },
        "annotations":[

        ]
    },
    "type":"Microsoft.DataFactory/factories/pipelines"
}

1 个答案:

答案 0 :(得分:1)

Azure数据工厂(ADF)中的ForEach activity默认情况下最多并行运行20个任务。您最多可以运行50个。如果要强制按顺序运行,即一个接一个地运行,则可以在ForEach UI的“设置”部分上设置“顺序”复选框(请参见下面)或将JSON中ForEach活动的isSequential属性设置为true,例如

Data Factory UI

{
    "name": "<MyForEachPipeline>",
    "properties": {
        "activities": [
            {
                "name": "<MyForEachActivity>",
                "type": "ForEach",
                "typeProperties": {
                    "isSequential": "true",
                    "items": {
...

我会提醒您使用此设置。连续运行(即一个接一个)运行会降低运行速度。您是否可以通过另一种方法来设计工作流,以利用Azure Data Factory的这一真正强大的功能?这样,您的工作将只需要最长的任务,而不是所有任务的总和。

比方说,我有一份工作要运行,其中有10项任务每个耗时1秒。如果我以串行方式运行此作业,则将花费10秒,但如果以并行方式运行,则将花费1秒。

SSIS从来没有真正拥有过-您可以手动创建多个路径,也可以使用第三方组件,但它不是内置的。这确实是ADF的绝佳功能,您应该尝试利用它。当然,有时候确实需要串行运行,这就是为什么可以使用此选项的原因。