在Azure数据工厂中复制增量数据中的数据流问题

时间:2019-12-23 15:35:29

标签: azure azure-data-factory-2 dataflow

我正在关注This文章,以将增量(增量)数据从一个SQL表复制到同一数据库中的另一个表。我的目标是使用ADF中的“复制”活动将所有数据从Blob存储移动到一个SQL表(临时表)中,然后根据上次修改日期将数据从以下位置复制到另一个SQL表(主表)中临时表。我已经能够按照创建水印表,目标表和源表的文章进行操作,并尝试使用数据流仅检查主表中不存在的最新值。

我没有收到任何错误,但我的主表为空。我将数据从Blob复制到临时表,但是没有数据从临时表复制到主表。我不确定,我在这里错过了什么。有人可以调查并指导我吗

这是DataFlow的json:

{
    "name": "FlowToGetTheLatestDate",
    "properties": {
        "type": "MappingDataFlow",
        "typeProperties": {
            "sources": [
                {
                    "dataset": {
                        "referenceName": "AzureSqlDatabaseDataSource",
                        "type": "DatasetReference"
                    },
                    "name": "SqlStagingData"
                },
                {
                    "dataset": {
                        "referenceName": "AzureSqlDatabaseDataSource",
                        "type": "DatasetReference"
                    },
                    "name": "watermarktable"
                }
            ],
            "sinks": [
                {
                    "dataset": {
                        "referenceName": "AzureSqlTable1",
                        "type": "DatasetReference"
                    },
                    "name": "Sinkorigdata"
                }
            ],
            "transformations": [
                {
                    "name": "DerivedColumn1"
                },
                {
                    "name": "JointoWatermark"
                },
                {
                    "name": "onlyLatestRecord"
                },
                {
                    "name": "SelectColumns"
                },
                {
                    "name": "DerivedColumn2"
                }
            ],
             "script": "\n\nsource(output(\n\t\ttransferId as string,\n\t\tfromPopulationId as string,\n\t\ttoPopulationId as string,\n\t\tcountFactor as string,\n\t\tbiomassFactor as string,\n\t\ttransferTime as string,\n\t\ttransferType as string,\n\t\tLOAD_DATE as string,\n\t\tFrom_Unit_ID as string,\n\t\tTo_Unit_ID as string,\n\t\tClient_ID as string,\n\t\tDateKey as string,\n\t\tFact_FishTransfer_ID as string,\n\t\tTableName as string\n\t),\n\tallowSchemaDrift: true,\n\tvalidateSchema: false,\n\tisolationLevel: 'READ_UNCOMMITTED',\n\tquery: 'SELECT\\n  transferId\\n, fromPopulationId\\n, toPopulationId\\n, countFactor\\n, biomassFactor\\n, transferTime\\n, transferType\\n, LOAD_DATE\\n, From_Unit_ID\\n, To_Unit_ID\\n, Client_ID\\n, DateKey\\n, Fact_FishTransfer_ID\\n, \\'Fact_Fish\\' AS TableName\\nFROM dbo.Fish_Transfer',\n\tformat: 'query') ~> SqlStagingData\nsource(output(\n\t\tTableName as string,\n\t\tWatermark as string,\n\t\tWatermarkValue as string\n\t),\n\tallowSchemaDrift: true,\n\tvalidateSchema: false,\n\tisolationLevel: 'READ_UNCOMMITTED',\n\tquery: 'SELECT\\n TableName\\n, Watermark\\n, WatermarkValue\\nFROM [dbo].[WatermarkTable]',\n\tformat: 'query') ~> watermarktable\nSqlStagingData derive(LOAD_DATE = toString(LOAD_DATE)) ~> DerivedColumn1\nDerivedColumn1, watermarktable join(SqlStagingData@TableName == watermarktable@TableName,\n\tjoinType:'left',\n\tbroadcast: 'none')~> JointoWatermark\nJointoWatermark filter(LOAD_DATE > WatermarkValue) ~> onlyLatestRecord\nonlyLatestRecord select(mapColumn(\n\t\ttransferId,\n\t\tfromPopulationId,\n\t\ttoPopulationId,\n\t\tcountFactor,\n\t\tbiomassFactor,\n\t\ttransferTime,\n\t\ttransferType,\n\t\tLOAD_DATE,\n\t\tFrom_Unit_ID,\n\t\tTo_Unit_ID,\n\t\tClient_ID,\n\t\tDateKey,\n\t\tFact_FishTransfer_ID\n\t),\n\tskipDuplicateMapInputs: true,\n\tskipDuplicateMapOutputs: true) ~> SelectColumns\nSelectColumns derive(LOAD_DATE = toDate(LOAD_DATE)) ~> DerivedColumn2\nDerivedColumn2 sink(input(\n\t\ttransferId as string,\n\t\tfromPopulationId as string,\n\t\ttoPopulationId as string,\n\t\tcountFactor as string,\n\t\tbiomassFactor as string,\n\t\ttransferTime as string,\n\t\ttransferType as string,\n\t\tLOAD_DATE as string,\n\t\tFrom_Unit_ID as string,\n\t\tTo_Unit_ID as string,\n\t\tClient_ID as string,\n\t\tDateKey as string,\n\t\tFact_FishTransfer_ID as string\n\t),\n\tallowSchemaDrift: true,\n\tvalidateSchema: false,\n\tdeletable:false,\n\tinsertable:true,\n\tupdateable:false,\n\tupsertable:false,\n\tformat: 'table') ~> Sinkorigdata"
        }

和我的复制活动json:

{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [],
"policy": {
    "timeout": "7.00:00:00",
    "retry": 0,
    "retryIntervalInSeconds": 30,
    "secureOutput": false,
    "secureInput": false
},
"userProperties": [],
"typeProperties": {
    "source": {
        "type": "DelimitedTextSource",
        "storeSettings": {
            "type": "AzureBlobStorageReadSettings",
            "recursive": true,
            "wildcardFileName": "*"
        },
        "formatSettings": {
            "type": "DelimitedTextReadSettings"
        }
    },
    "sink": {
        "type": "AzureSqlSink"
    },
    "enableStaging": false
},
"inputs": [
    {
        "referenceName": "SourceDataset_wkn",
        "type": "DatasetReference"
    }
],
"outputs": [
    {
        "referenceName": "AzureSqlDatabaseDataSource",
        "type": "DatasetReference"
    }
]

}

预先感谢

0 个答案:

没有答案