我想将Azure Data Factory与Azure Data Lake Analytics一起用作操作,但没有成功。
这是我的PIPELINE脚本
{
"name": "UsageStatistivsPipeline",
"properties": {
"description": "Standardize JSON data into CSV, with friendly column names & consistent output for all event types. Creates one output (standardized) file per day.",
"activities": [{
"name": "UsageStatisticsActivity",
"type": "DataLakeAnalyticsU-SQL",
"linkedServiceName": {
"referenceName": "DataLakeAnalytics",
"type": "LinkedServiceReference"
},
"typeProperties": {
"scriptLinkedService": {
"referenceName": "BlobStorage",
"type": "LinkedServiceReference"
},
"scriptPath": "adla-scripts/usage-statistics-adla-script.json",
"degreeOfParallelism": 30,
"priority": 100,
"parameters": {
"sourcefile": "wasb://nameofblob.blob.core.windows.net/$$Text.Format('{0:yyyy}/{0:MM}/{0:dd}/0_647de4764587459ea9e0ce6a73e9ace7_2.json', SliceStart)",
"destinationfile": "$$Text.Format('wasb://nameofblob.blob.core.windows.net/{0:yyyy}/{0:MM}/{0:dd}/DailyResult.csv', SliceStart)"
}
},
"inputs": [{
"type": "DatasetReference",
"referenceName": "DirectionsData"
}
],
"outputs": [{
"type": "DatasetReference",
"referenceName": "OutputData"
}
],
"policy": {
"timeout": "06:00:00",
"concurrency": 10,
"executionPriorityOrder": "NewestFirst"
}
}
],
"start": "2018-01-08T00:00:00Z",
"end": "2017-01-09T00:00:00Z",
"isPaused": false,
"pipelineMode": "Scheduled"
}}
我有两个参数变量sourcefile
和destinationfile
,它们是动态的(路径来自日期)。
然后我将这个ADLA脚本用于执行。
REFERENCE ASSEMBLY master.[Newtonsoft.Json];
REFERENCE ASSEMBLY master.[Microsoft.Analytics.Samples.Formats];
USING Microsoft.Analytics.Samples.Formats.Json;
@Data =
EXTRACT
jsonstring string
FROM @sourcefile
USING Extractors.Tsv(quoting:false);
@CreateJSONTuple =
SELECT
JsonFunctions.JsonTuple(jsonstring) AS EventData
FROM
@Data;
@records =
SELECT
JsonFunctions.JsonTuple(EventData["records"], "[*].*") AS record
FROM
@CreateJSONTuple;
@properties =
SELECT
JsonFunctions.JsonTuple(record["[0].properties"]) AS prop,
record["[0].time"] AS time
FROM
@records;
@result =
SELECT
...
FROM @properties;
OUTPUT @result
TO @destinationfile
USING Outputters.Csv(outputHeader:false,quoting:true);
修改
看来Text.Format没有执行并像字符串一样传递到脚本中......然后在Data Lake Analytics中,Job详细信息如下:
DECLARE @sourcefile string = "$$Text.Format('wasb://nameofblob.blob.core.windows.net/{0:yyyy}/{0:MM}/{0:dd}/0_647de4764587459ea9e0ce6a73e9ace7_2.json', SliceStart)";
答案 0 :(得分:0)
在您的代码示例中,sourcefile参数的定义方式与destinationfile的定义方式不同。后者似乎是正确的,而前者则没有。
整个字符串应该包含在$$ Text.Format()中:
"paramName" : "$$Text.Format('...{0:pattern}...', param)"
另外,请考虑仅传递格式化日期:
"sliceStart": "$$Text.Format('{0:yyyy-MM-dd}', SliceStart)"
然后在U-SQL中完成剩下的工作:
DECLARE @sliceStartDate DateTime = DateTime.Parse(@sliceStart);
DECLARE @path string = String.Format("wasb://path/to/file/{0:yyyy}/{0:MM}/{0:dd}/file.csv", @sliceStartDate);
希望这有帮助