azure数据工厂启动管道与启动工作不同

时间:2018-01-03 09:20:47

标签: azure-data-factory

我对此问题感到疯狂,我正在运行Azure数据工厂V1,我需要每周从2009年3月1日到2009年1月31日安排一份复印作业,所以我在管道上定义了这个计划:

    "start": "2009-01-03T00:00:00Z",
    "end": "2009-01-31T00:00:00Z",
    "isPaused": false,

监控管道,这些日期的数据工厂计划:

12/29/2008
01/05/2009
01/12/2009
01/19/2009
01/26/2009

而不是这个想要的时间表:

01/03/2009
01/10/2009
01/17/2009
01/24/2009
01/31/2009

为什么管道上定义的起始日期与监视器上的日程安排日期不对应?

非常感谢!

这是JSON管道:

{
"name": "CopyPipeline-blob2datalake",
"properties": {
    "description": "copy from blob storage to datalake directory structure",
    "activities": [
        {
            "type": "DataLakeAnalyticsU-SQL",
            "typeProperties": {
                "scriptPath": "script/dat230.usql",
                "scriptLinkedService": "AzureStorageLinkedService",
                "degreeOfParallelism": 5,
                "priority": 100,
                "parameters": {
                    "salesfile": "$$Text.Format('/DAT230/{0:yyyy}/{0:MM}/{0:dd}.txt', Date.StartOfDay (SliceStart))",
                    "lineitemsfile": "$$Text.Format('/dat230/dataloads/{0:yyyy}/{0:MM}/{0:dd}/factinventory/fact.csv', Date.StartOfDay (SliceStart))"
                }
            },
            "inputs": [
                {
                    "name": "InputDataset-dat230"
                }
            ],
            "outputs": [
                {
                    "name": "OutputDataset-dat230"
                }
            ],
            "policy": {
                "timeout": "01:00:00",
                "concurrency": 1,
                "retry": 1
            },
            "scheduler": {
                "frequency": "Day",
                "interval": 7
            },
            "name": "DataLakeAnalyticsUSqlActivityTemplate",
            "linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
        }
    ],
    "start": "2009-01-03T00:00:00Z",
    "end": "2009-01-11T00:00:00Z",
    "isPaused": false,
    "hubName": "edxlearningdf_hub",
    "pipelineMode": "Scheduled"
}
}

这里是数据集:

{
"name": "InputDataset-dat230",
"properties": {
    "structure": [
        {
            "name": "Date",
            "type": "Datetime"
        },
        {
            "name": "StoreID",
            "type": "Int64"
        },
        {
            "name": "StoreName",
            "type": "String"
        },
        {
            "name": "ProductID",
            "type": "Int64"
        },
        {
            "name": "ProductName",
            "type": "String"
        },
        {
            "name": "Color",
            "type": "String"
        },
        {
            "name": "Size",
            "type": "String"
        },
        {
            "name": "Manufacturer",
            "type": "String"
        },
        {
            "name": "OnHandQuantity",
            "type": "Int64"
        },
        {
            "name": "OnOrderQuantity",
            "type": "Int64"
        },
        {
            "name": "SafetyStockQuantity",
            "type": "Int64"
        },
        {
            "name": "UnitCost",
            "type": "Double"
        },
        {
            "name": "DaysInStock",
            "type": "Int64"
        },
        {
            "name": "MinDayInStock",
            "type": "Int64"
        },
        {
            "name": "MaxDayInStock",
            "type": "Int64"
        }
    ],
    "published": false,
    "type": "AzureBlob",
    "linkedServiceName": "Source-BlobStorage-dat230",
    "typeProperties": {
        "fileName": "*.txt.gz",
        "folderPath": "dat230/{year}/{month}/{day}/",
        "format": {
            "type": "TextFormat",
            "columnDelimiter": "\t",
            "firstRowAsHeader": true
        },
        "partitionedBy": [
            {
                "name": "year",
                "value": {
                    "type": "DateTime",
                    "date": "WindowStart",
                    "format": "yyyy"
                }
            },
            {
                "name": "month",
                "value": {
                    "type": "DateTime",
                    "date": "WindowStart",
                    "format": "MM"
                }
            },
            {
                "name": "day",
                "value": {
                    "type": "DateTime",
                    "date": "WindowStart",
                    "format": "dd"
                }
            }
        ],
        "compression": {
            "type": "GZip"
        }
    },
    "availability": {
        "frequency": "Day",
        "interval": 7
    },
    "external": true,
    "policy": {}
}
}

{
"name": "OutputDataset-dat230",
"properties": {
    "structure": [
        {
            "name": "Date",
            "type": "Datetime"
        },
        {
            "name": "StoreID",
            "type": "Int64"
        },
        {
            "name": "StoreName",
            "type": "String"
        },
        {
            "name": "ProductID",
            "type": "Int64"
        },
        {
            "name": "ProductName",
            "type": "String"
        },
        {
            "name": "Color",
            "type": "String"
        },
        {
            "name": "Size",
            "type": "String"
        },
        {
            "name": "Manufacturer",
            "type": "String"
        },
        {
            "name": "OnHandQuantity",
            "type": "Int64"
        },
        {
            "name": "OnOrderQuantity",
            "type": "Int64"
        },
        {
            "name": "SafetyStockQuantity",
            "type": "Int64"
        },
        {
            "name": "UnitCost",
            "type": "Double"
        },
        {
            "name": "DaysInStock",
            "type": "Int64"
        },
        {
            "name": "MinDayInStock",
            "type": "Int64"
        },
        {
            "name": "MaxDayInStock",
            "type": "Int64"
        }
    ],
    "published": false,
    "type": "AzureDataLakeStore",
    "linkedServiceName": "Destination-DataLakeStore-dat230",
    "typeProperties": {
        "fileName": "txt.gz",
        "folderPath": "dat230/dataloads/{year}/{month}/{day}/factinventory/",
        "format": {
            "type": "TextFormat",
            "columnDelimiter": "\t"
        },
        "partitionedBy": [
            {
                "name": "year",
                "value": {
                    "type": "DateTime",
                    "date": "WindowStart",
                    "format": "yyyy"
                }
            },
            {
                "name": "month",
                "value": {
                    "type": "DateTime",
                    "date": "WindowStart",
                    "format": "MM"
                }
            },
            {
                "name": "day",
                "value": {
                    "type": "DateTime",
                    "date": "WindowStart",
                    "format": "dd"
                }
            }
        ]
    },
    "availability": {
        "frequency": "Day",
        "interval": 7
    },
    "external": false,
    "policy": {}
}
}

2 个答案:

答案 0 :(得分:0)

您需要查看数据集的时间片和活动。

管道计划(命名错误)仅定义任何活动可用于配置和运行时间片的开始和结束时间段。

ADFv1不使用SQL Server代理之类的递归调度。每次执行都必须在您创建的时间线(计划)上以一定间隔进行配置。

例如,如果您的管道开始和结束是1年。但是你的数据集和活动的频率是每月和1个月的间隔,你将只能执行12次执行。

道歉,但如果您还不熟悉,时间片的概念有点难以解释。也许请阅读这篇文章:https://blogs.msdn.microsoft.com/ukdataplatform/2016/05/03/demystifying-activity-scheduling-with-azure-data-factory/

希望这有帮助。

答案 1 :(得分:0)

您是否会与我们分享数据集和管道的json?帮助你做到这一点会更容易。

同时,检查你是否正在使用" style":" StartOfInterval"在活动的scheduler属性中,还要检查是否使用了偏移量。

干杯!