我正在使用Azure Data Factory v1开发ETL(很遗憾,我无法使用Azure Data Factory v2 )。
我想从给定的blob存储容器中读取所有.csv文件,然后将每个文件的内容写入SQL Azure中的表中。
目标表包含csv文件中的所有列。它还必须包含一个新列,其中包含数据来自的文件的名称。
这就是我要坚持的地方:我找不到将文件名从源数据集(来自blob存储源的.csv文件)传递到目标数据集(Sql Azure接收器)的方法。
我已经实现了从blob存储读取文件并将其保存到SQL Azure中的表的管道。
以下是json的摘录,它将一个文件复制到SQL Azure:
{
"name": "pipelineFileImport",
"properties": {
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource",
"recursive": false
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "TypeOfRecord:TypeOfRecord,TPMType:TPMType,..."
}
},
"inputs": [
{
"name": "InputDataset-cn0"
}
],
"outputs": [
{
"name": "OutputDataset-cn0"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Day",
"interval": 1
},
"name": "Activity-0-pipelineFileImport_csv->[staging]_[Files]"
}
],
"start": "2018-07-20T09:50:55.486Z",
"end": "2018-07-20T09:50:55.486Z",
"isPaused": false,
"hubName": "test_hub",
"pipelineMode": "OneTime",
"expirationTime": "3.00:00:00",
"datasets": [
{
"name": "InputDataset-cn0",
"properties": {
"structure": [
{
"name": "TypeOfRecord",
"type": "String"
},
{
"name": "TPMType",
"type": "String"
},
...
],
"published": false,
"type": "AzureBlob",
"linkedServiceName": "Source-TestBlobStorage",
"typeProperties": {
"fileName": "testFile001.csv",
"folderPath": "fileinput",
"format": {
"type": "TextFormat",
"columnDelimiter": ";",
"firstRowAsHeader": true
}
},
"availability": {
"frequency": "Day",
"interval": 1
},
"external": true,
"policy": {}
}
},
{
"name": "OutputDataset-cn0",
"properties": {
"structure": [
{
"name": "TypeOfRecord",
"type": "String"
},
{
"name": "TPMType",
"type": "String"
},...
],
"published": false,
"type": "AzureSqlTable",
"linkedServiceName": "Destination-SQLAzure-cn0",
"typeProperties": {
"tableName": "[staging].[Files]"
},
"availability": {
"frequency": "Day",
"interval": 1
},
"external": false,
"policy": {}
}
}
]
}
}
我需要一种将源文件的名称传递到目标数据集的方法,以便将其写入SQL Azure数据库中。
答案 0 :(得分:1)
没有本机处理此问题的方法。但是我认为您可以使用存储过程来实现这一目标。