Azure DataFactory ForEach Copy活动不会重复进行,而是拉出blob中的所有文件。为什么?

时间:2019-05-21 15:28:24

标签: json azure foreach iterator azure-data-factory

我在DF2中有一个管道,该管道必须查看blob中的文件夹,然后将145个文件中的每个文件依次处理到数据库表中。将每个文件加载到表中之后,应触发存储过程,该存储过程将检查每个记录并将其插入,或将现有记录更新到主表中。

在线查看,我感觉好像已经尝试了建议的“ Get MetaData”,“ For Each”,“ LookUp”和“ Assign Variable”激活的每种组合,但是由于某种原因,我的“复制数据”仍然可以拾取所有文件同时运行145次。

最近在网上找到了一个博客,我关注该博客使用“分配变量”,因为它对多个文件位置很有用,但对我不起作用。我需要将CSV文件读取为表而不是二进制对象,因此我认为这是我的问题。

    {
        "name": "BulkLoadPipeline",
        "properties": {
            "activities": [
                {
                    "name": "GetFileNames",
                    "type": "GetMetadata",
                    "policy": {
                        "timeout": "7.00:00:00",
                        "retry": 0,
                        "retryIntervalInSeconds": 30,
                        "secureOutput": false,
                        "secureInput": false
                    },
                    "typeProperties": {
                        "dataset": {
                            "referenceName": "DelimitedText1",
                            "type": "DatasetReference",
                            "parameters": {
                                "fileName": "@item()"
                            }
                        },
                        "fieldList": [
                            "childItems"
                        ],
                        "storeSettings": {
                            "type": "AzureBlobStorageReadSetting"
                        },
                        "formatSettings": {
                            "type": "DelimitedTextReadSetting"
                        }
                    }
                },
                {
                    "name": "CopyDataRunDeltaCheck",
                    "type": "ForEach",
                    "dependsOn": [
                        {
                            "activity": "BuildList",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "typeProperties": {
                        "items": {
                            "value": "@variables('fileList')",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "WriteToTables",
                                "type": "Copy",
                                "policy": {
                                    "timeout": "7.00:00:00",
                                    "retry": 0,
                                    "retryIntervalInSeconds": 30,
                                    "secureOutput": false,
                                    "secureInput": false
                                },
                                "typeProperties": {
                                    "source": {
                                        "type": "DelimitedTextSource",
                                        "storeSettings": {
                                            "type": "AzureBlobStorageReadSetting",
                                            "wildcardFileName": "*.*"
                                        },
                                        "formatSettings": {
                                            "type": "DelimitedTextReadSetting"
                                        }
                                    },
                                    "sink": {
                                        "type": "AzureSqlSink"
                                    },
                                    "enableStaging": false,
                                    "translator": {
                                        "type": "TabularTranslator",
                                        "mappings": [
                                            {
                                                "source": {
                                                    "name": "myID",
                                                    "type": "String"
                                                },
                                                "sink": {
                                                    "name": "myID",
                                                    "type": "String"
                                                }
                                            },
                                            {
                                                "source": {
                                                    "name": "Col1",
                                                    "type": "String"
                                                },
                                                "sink": {
                                                    "name": "Col1",
                                                    "type": "String"
                                                }
                                            },
                                            {
                                                "source": {
                                                    "name": "Col2",
                                                    "type": "String"
                                                },
                                                "sink": {
                                                    "name": "Col2",
                                                    "type": "String"
                                                }
                                            },
                                            {
                                                "source": {
                                                    "name": "Col3",
                                                    "type": "String"
                                                },
                                                "sink": {
                                                    "name": "Col3",
                                                    "type": "String"
                                                }
                                            },
                                            {
                                                "source": {
                                                    "name": "Col4",
                                                    "type": "String"
                                                },
                                                "sink": {
                                                    "name": "Col4",
                                                    "type": "String"
                                                }
                                            },
                                            {
                                                "source": {
                                                    "name": "DW Date Created",
                                                    "type": "String"
                                                },
                                                "sink": {
                                                    "name": "DW_Date_Created",
                                                    "type": "String"
                                                }
                                            },
                                            {
                                                "source": {
                                                    "name": "DW Date Updated",
                                                    "type": "String"
                                                },
                                                "sink": {
                                                    "name": "DW_Date_Updated",
                                                    "type": "String"
                                                }
                                            }
                                        ]
                                    }
                                },
                                "inputs": [
                                    {
                                        "referenceName": "DelimitedText1",
                                        "type": "DatasetReference",
                                        "parameters": {
                                            "fileName": "@item()"
                                        }
                                    }
                                ],
                                "outputs": [
                                    {
                                        "referenceName": "myTable",
                                        "type": "DatasetReference"
                                    }
                                ]
                            },
                            {
                                "name": "CheckDeltas",
                                "type": "SqlServerStoredProcedure",
                                "dependsOn": [
                                    {
                                        "activity": "WriteToTables",
                                        "dependencyConditions": [
                                            "Succeeded"
                                        ]
                                    }
                                ],
                                "policy": {
                                    "timeout": "7.00:00:00",
                                    "retry": 0,
                                    "retryIntervalInSeconds": 30,
                                    "secureOutput": false,
                                    "secureInput": false
                                },
                                "typeProperties": {
                                    "storedProcedureName": "[TL].[uspMyCheck]"
                                },
                                "linkedServiceName": {
                                    "referenceName": "myService",
                                    "type": "LinkedServiceReference"
                                }
                            }
                        ]
                    }
                },
                {
                    "name": "BuildList",
                    "type": "ForEach",
                    "dependsOn": [
                        {
                            "activity": "GetFileNames",
                            "dependencyConditions": [
                                "Succeeded"
                            ]
                        }
                    ],
                    "typeProperties": {
                        "items": {
                            "value": "@activity('GetFileNames').output.childItems",
                            "type": "Expression"
                        },
                        "isSequential": true,
                        "activities": [
                            {
                                "name": "Create list from variables",
                                "type": "AppendVariable",
                                "typeProperties": {
                                    "variableName": "fileList",
                                    "value": "@item().name"
                                }
                            }
                        ]
                    }
                }
            ],
            "variables": {
                "fileList": {
                    "type": "Array"
                }
            }
        }
    }

管线输出的“详细信息”屏幕显示了Blob中项目数量的管道循环,但是每次都针对列表中的每个文件一次运行“复制数据”和“存储过程”,而不是一次运行一次。

我觉得我已经接近答案了,但缺少一个重要的部分。任何帮助或建议都将不胜感激。

1 个答案:

答案 0 :(得分:0)

您的有效载荷不正确。

  1. GetMetadata活动不应与“复制活动”使用相同的数据集。
  2. GetMetadata活动应引用带有文件夹的数据集,该文件夹包含您要处理的所有文件。但是您的数据集具有“文件名”参数。
  3. 将getMetadata活动的输出用作forEach活动的输入。 childItems