如何修复Data Lake Analytics脚本

时间:2018-01-24 07:37:40

标签: azure-data-factory azure-data-lake

我想将Azure Data Factory与Azure Data Lake Analytics一起用作操作,但没有成功。

这是我的PIPELINE脚本

{
"name": "UsageStatistivsPipeline",
"properties": {
    "description": "Standardize JSON data into CSV, with friendly column names & consistent output for all event types. Creates one output (standardized) file per day.",
    "activities": [{
            "name": "UsageStatisticsActivity",
            "type": "DataLakeAnalyticsU-SQL",
            "linkedServiceName": {
                "referenceName": "DataLakeAnalytics",
                "type": "LinkedServiceReference"
            },
            "typeProperties": {
                "scriptLinkedService": {
                    "referenceName": "BlobStorage",
                    "type": "LinkedServiceReference"
                },
                "scriptPath": "adla-scripts/usage-statistics-adla-script.json",
                "degreeOfParallelism": 30,
                "priority": 100,
                "parameters": {
                    "sourcefile": "wasb://nameofblob.blob.core.windows.net/$$Text.Format('{0:yyyy}/{0:MM}/{0:dd}/0_647de4764587459ea9e0ce6a73e9ace7_2.json', SliceStart)",
                    "destinationfile": "$$Text.Format('wasb://nameofblob.blob.core.windows.net/{0:yyyy}/{0:MM}/{0:dd}/DailyResult.csv', SliceStart)"
                }
            },
            "inputs": [{
                    "type": "DatasetReference",
                    "referenceName": "DirectionsData"
                }
            ],
            "outputs": [{
                    "type": "DatasetReference",
                    "referenceName": "OutputData"
                }
            ],
            "policy": {
                "timeout": "06:00:00",
                "concurrency": 10,
                "executionPriorityOrder": "NewestFirst"
            }
        }
    ],
    "start": "2018-01-08T00:00:00Z",
    "end": "2017-01-09T00:00:00Z",
    "isPaused": false,
    "pipelineMode": "Scheduled"
}}

我有两个参数变量sourcefiledestinationfile,它们是动态的(路径来自日期)。

然后我将这个ADLA脚本用于执行。

REFERENCE ASSEMBLY master.[Newtonsoft.Json];
REFERENCE ASSEMBLY master.[Microsoft.Analytics.Samples.Formats]; 

USING Microsoft.Analytics.Samples.Formats.Json;

@Data = 
    EXTRACT 
        jsonstring string
    FROM @sourcefile
    USING Extractors.Tsv(quoting:false);


@CreateJSONTuple = 
    SELECT 
        JsonFunctions.JsonTuple(jsonstring) AS EventData 
    FROM 
        @Data;

@records = 
    SELECT
        JsonFunctions.JsonTuple(EventData["records"], "[*].*") AS record
    FROM 
        @CreateJSONTuple;

@properties =
    SELECT 
        JsonFunctions.JsonTuple(record["[0].properties"]) AS prop,
        record["[0].time"] AS time
    FROM 
        @records;

@result =
    SELECT 
        ...
    FROM @properties;


OUTPUT @result
TO @destinationfile
USING Outputters.Csv(outputHeader:false,quoting:true);

作业执行失败,错误是: Error Detail

修改

看来Text.Format没有执行并像字符串一样传递到脚本中......然后在Data Lake Analytics中,Job详细信息如下:

DECLARE @sourcefile string = "$$Text.Format('wasb://nameofblob.blob.core.windows.net/{0:yyyy}/{0:MM}/{0:dd}/0_647de4764587459ea9e0ce6a73e9ace7_2.json', SliceStart)";

1 个答案:

答案 0 :(得分:0)

在您的代码示例中,sourcefile参数的定义方式与destinationfile的定义方式不同。后者似乎是正确的,而前者则没有。

整个字符串应该包含在$$ Text.Format()中:

"paramName" : "$$Text.Format('...{0:pattern}...', param)"

另外,请考虑仅传递格式化日期:

"sliceStart": "$$Text.Format('{0:yyyy-MM-dd}', SliceStart)"

然后在U-SQL中完成剩下的工作:

DECLARE @sliceStartDate DateTime = DateTime.Parse(@sliceStart);

DECLARE @path string = String.Format("wasb://path/to/file/{0:yyyy}/{0:MM}/{0:dd}/file.csv", @sliceStartDate);

希望这有帮助