我有2个blob文件要复制到Azure SQL表。我的管道有两个活动:
or $t4,$zero,$zero
slt $t1,$t2,$t3
movn $t4,$t3,$t1 # t4 = (t2 < t3) ? t3 : 0
movn $t1,$t2,$t1 # t1 = (t2 < t3) ? t2 : 0
xor $t1,$t1,$t4 # t1 = (t2 < t3) ? (t2 ^ t3) : 0
xor $t1,$t1,$t3 # t1 = (t2 < t3) ? t2 : t3
据我了解,一旦第一次活动完成,第二次开始。那么你如何执行这个管道,而不是去数据集切片并手动运行?另外 pipelineMode 如何才能设置OneTime而不是Scheduled?
答案 0 :(得分:2)
为了让活动同步运行(有序),第一个管道的输出将需要是第二个管道的输入。
{
"name": "NutrientDataBlobToAzureSqlPipeline",
"properties": {
"description": "Copy nutrient data from Azure BLOB to Azure SQL",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource"
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 10000,
"writeBatchTimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "FoodGroupDescriptionsAzureBlob"
}
],
"outputs": [
{
"name": "FoodGroupDescriptionsSQLAzureFirst"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "FoodGroupDescriptions",
"description": "#1 Bulk Import FoodGroupDescriptions"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource"
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 10000,
"writeBatchTimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "FoodGroupDescriptionsSQLAzureFirst",
"name": "FoodDescriptionsAzureBlob"
}
],
"outputs": [
{
"name": "FoodDescriptionsSQLAzureSecond"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "FoodDescriptions",
"description": "#2 Bulk Import FoodDescriptions"
}
],
"start": "2015-07-14T00:00:00Z",
"end": "2015-07-14T00:00:00Z",
"isPaused": false,
"hubName": "gymappdatafactory_hub",
"pipelineMode": "Scheduled"
}
如果您注意到第一个活动的输出&#34; FoodGroupDescriptionsSQLAzureFirst&#34;成为第二个活动的输入。
答案 1 :(得分:0)
如果我理解正确,您希望在不手动执行数据集切片的情况下执行这两个活动。
您只需将数据集定义为外部数据即可。
作为一个例子
{
"name": "FoodGroupDescriptionsAzureBlob",
"properties": {
"type": "AzureBlob",
"linkedServiceName": "AzureBlobStore",
"typeProperties": {
"folderPath": "mycontainer/folder",
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": "|"
}
},
"external": true,
"availability": {
"frequency": "Day",
"interval": 1
}
}
}
观察属性外部被标记为true。这会将数据集移动到就绪状态 自动 。 遗憾的是,没有人将管道标记为运行一次。运行管道后,您可以选择将 isPaused 属性设置为true,以防止进一步执行。
注意: 外部属性只能为输入数据集设置为true。 所有具有标记为 外部的输入数据集的活动将并行执行 。