我已经创建了一个azure数据工厂来使用“DataLakeAnalyticsU-SQL”活动来安排U-SQL脚本。请参阅以下代码:
InputDataset
{
"name": "InputDataLakeTable",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "LinkedServiceSource",
"typeProperties": {
"fileName": "SearchLog.txt",
"folderPath": "demo/",
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": "|",
"quoteChar": "\""
}
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
OutputDataset:
{
"name": "OutputDataLakeTable",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "LinkedServiceDestination",
"typeProperties": {
"folderPath": "scripts/"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
Pipeline:
{
"name": "ComputeEventsByRegionPipeline",
"properties": {
"description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "scripts\\SearchLogProcessing.txt",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/demo/SearchLog.txt",
"out": "/scripts/Result.txt"
}
},
"inputs": [
{
"name": "InputDataLakeTable"
}
],
"outputs": [
{
"name": "OutputDataLakeTable"
}
],
"policy": {
"timeout": "06:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"retry": 1
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "CopybyU-SQL",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2016-12-21T17:44:13.557Z",
"end": "2016-12-22T17:44:13.557Z",
"isPaused": false,
"hubName": "denojaidbfactory_hub",
"pipelineMode": "Scheduled"
}
}
我已成功创建所有必需的链接服务。 但是在部署管道之后,没有为输入数据集创建时间片。见下图:
而输出数据集期望上游输入数据集时间片。因此,输出数据集的时间片仍处于挂起的执行状态,并且我的Azure数据工厂管道无法正常工作。 见下图: 任何解决此问题的建议。
答案 0 :(得分:2)
如果您没有其他活动正在创建InputDataLakeTable,则需要添加属性
"external": true
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-faq
https://docs.microsoft.com/en-us/azure/data-factory/data-factory-create-datasets