在ADFv1管道中将源文件名传递到目标

时间:2018-07-23 12:52:02

标签: azure azure-storage etl azure-storage-blobs azure-data-factory

场景

我正在使用Azure Data Factory v1开发ETL(很遗憾,我无法使用Azure Data Factory v2 )。

我想从给定的blob存储容器中读取所有.csv文件,然后将每个文件的内容写入SQL Azure中的表中。

问题

目标表包含csv文件中的所有列。它还必须包含一个新列,其中包含数据来自的文件的名称。

这就是我要坚持的地方:我找不到将文件名从源数据集(来自blob存储源的.csv文件)传递到目标数据集(Sql Azure接收器)的方法。

我已经尝试过的东西

我已经实现了从blob存储读取文件并将其保存到SQL Azure中的表的管道。

以下是json的摘录,它将一个文件复制到SQL Azure:

{
    "name": "pipelineFileImport",
    "properties": {
        "activities": [
            {
                "type": "Copy",
                "typeProperties": {
                    "source": {
                        "type": "BlobSource",
                        "recursive": false
                    },
                    "sink": {
                        "type": "SqlSink",
                        "writeBatchSize": 0,
                        "writeBatchTimeout": "00:00:00"
                    },
                    "translator": {
                        "type": "TabularTranslator",
                        "columnMappings": "TypeOfRecord:TypeOfRecord,TPMType:TPMType,..."
                    }
                },
                "inputs": [
                    {
                        "name": "InputDataset-cn0"
                    }
                ],
                "outputs": [
                    {
                        "name": "OutputDataset-cn0"
                    }
                ],
                "policy": {
                    "timeout": "1.00:00:00",
                    "concurrency": 1,
                    "executionPriorityOrder": "NewestFirst",
                    "style": "StartOfInterval",
                    "retry": 3,
                    "longRetry": 0,
                    "longRetryInterval": "00:00:00"
                },
                "scheduler": {
                    "frequency": "Day",
                    "interval": 1
                },
                "name": "Activity-0-pipelineFileImport_csv->[staging]_[Files]"
            }
        ],
        "start": "2018-07-20T09:50:55.486Z",
        "end": "2018-07-20T09:50:55.486Z",
        "isPaused": false,
        "hubName": "test_hub",
        "pipelineMode": "OneTime",
        "expirationTime": "3.00:00:00",
        "datasets": [
            {
                "name": "InputDataset-cn0",
                "properties": {
                    "structure": [
                        {
                            "name": "TypeOfRecord",
                            "type": "String"
                        },
                        {
                            "name": "TPMType",
                            "type": "String"
                        },
                        ...
                    ],
                    "published": false,
                    "type": "AzureBlob",
                    "linkedServiceName": "Source-TestBlobStorage",
                    "typeProperties": {
                        "fileName": "testFile001.csv",
                        "folderPath": "fileinput",
                        "format": {
                            "type": "TextFormat",
                            "columnDelimiter": ";",
                            "firstRowAsHeader": true
                        }
                    },
                    "availability": {
                        "frequency": "Day",
                        "interval": 1
                    },
                    "external": true,
                    "policy": {}
                }
            },
            {
                "name": "OutputDataset-cn0",
                "properties": {
                    "structure": [
                        {
                            "name": "TypeOfRecord",
                            "type": "String"
                        },
                        {
                            "name": "TPMType",
                            "type": "String"
                        },...
                    ],
                    "published": false,
                    "type": "AzureSqlTable",
                    "linkedServiceName": "Destination-SQLAzure-cn0",
                    "typeProperties": {
                        "tableName": "[staging].[Files]"
                    },
                    "availability": {
                        "frequency": "Day",
                        "interval": 1
                    },
                    "external": false,
                    "policy": {}
                }
            }
        ]
    }
}

我需要什么

我需要一种将源文件的名称传递到目标数据集的方法,以便将其写入SQL Azure数据库中。

1 个答案:

答案 0 :(得分:1)

没有本机处理此问题的方法。但是我认为您可以使用存储过程来实现这一目标。

请参考存储过程属性。 https://docs.microsoft.com/en-us/azure/data-factory/v1/data-factory-azure-sql-connector#copy-activity-properties