Azure数据工厂 - 管道执行顺序中的多个活动

时间:2016-03-13 12:14:03

标签: azure pipeline azure-data-factory

我有2个blob文件要复制到Azure SQL表。我的管道有两个活动:

or $t4,$zero,$zero
slt $t1,$t2,$t3
movn $t4,$t3,$t1  # t4 = (t2 < t3) ? t3 : 0
movn $t1,$t2,$t1  # t1 = (t2 < t3) ? t2 : 0
xor $t1,$t1,$t4   # t1 = (t2 < t3) ? (t2 ^ t3) : 0
xor $t1,$t1,$t3   # t1 = (t2 < t3) ? t2 : t3

据我了解,一旦第一次活动完成,第二次开始。那么你如何执行这个管道,而不是去数据集切片并手动运行?另外 pipelineMode 如何才能设置OneTime而不是Scheduled?

2 个答案:

答案 0 :(得分:2)

为了让活动同步运行(有序),第一个管道的输出将需要是第二个管道的输入。

{
"name": "NutrientDataBlobToAzureSqlPipeline",
"properties": {
    "description": "Copy nutrient data from Azure BLOB to Azure SQL",
    "activities": [
        {
            "type": "Copy",
            "typeProperties": {
                "source": {
                    "type": "BlobSource"
                },
                "sink": {
                    "type": "SqlSink",
                    "writeBatchSize": 10000,
                    "writeBatchTimeout": "60.00:00:00"
                }
            },
            "inputs": [
                {
                    "name": "FoodGroupDescriptionsAzureBlob"
                }
            ],
            "outputs": [
                {
                    "name": "FoodGroupDescriptionsSQLAzureFirst"
                }
            ],
            "policy": {
                "timeout": "01:00:00",
                "concurrency": 1,
                "executionPriorityOrder": "NewestFirst"
            },
            "scheduler": {
                "frequency": "Minute",
                "interval": 15
            },
            "name": "FoodGroupDescriptions",
            "description": "#1 Bulk Import FoodGroupDescriptions"
        },
        {
            "type": "Copy",
            "typeProperties": {
                "source": {
                    "type": "BlobSource"
                },
                "sink": {
                    "type": "SqlSink",
                    "writeBatchSize": 10000,
                    "writeBatchTimeout": "60.00:00:00"
                }
            },
            "inputs": [
                {
                    "name": "FoodGroupDescriptionsSQLAzureFirst",
                    "name": "FoodDescriptionsAzureBlob"
                }
            ],
            "outputs": [
                {
                    "name": "FoodDescriptionsSQLAzureSecond"
                }
            ],
            "policy": {
                "timeout": "01:00:00",
                "concurrency": 1,
                "executionPriorityOrder": "NewestFirst"
            },
            "scheduler": {
                "frequency": "Minute",
                "interval": 15
            },
            "name": "FoodDescriptions",
            "description": "#2 Bulk Import FoodDescriptions"
        }
    ],
    "start": "2015-07-14T00:00:00Z",
    "end": "2015-07-14T00:00:00Z",
    "isPaused": false,
    "hubName": "gymappdatafactory_hub",
    "pipelineMode": "Scheduled"
}

如果您注意到第一个活动的输出&#34; FoodGroupDescriptionsSQLAzureFirst&#34;成为第二个活动的输入。

答案 1 :(得分:0)

如果我理解正确,您希望在不手动执行数据集切片的情况下执行这两个活动。

您只需将数据集定义为外部数据即可。

作为一个例子

{
    "name": "FoodGroupDescriptionsAzureBlob",
    "properties": {
        "type": "AzureBlob",
        "linkedServiceName": "AzureBlobStore",
        "typeProperties": {
            "folderPath": "mycontainer/folder",
            "format": {
                "type": "TextFormat",
                "rowDelimiter": "\n",
                "columnDelimiter": "|"
            }
        },
        "external": true,
        "availability": {
            "frequency": "Day",
            "interval": 1
        }
    }
}

观察属性外部被标记为true。这会将数据集移动到就绪状态 自动 。 遗憾的是,没有人将管道标记为运行一次。运行管道后,您可以选择将 isPaused 属性设置为true,以防止进一步执行。

注意: 外部属性只能为输入数据集设置为true。 所有具有标记为 外部的输入数据集的活动将并行执行