我有以下Azure数据工厂设置:
关联服务:
"name": "AzureStorageLinkedService",
"properties": {
"description": "",
"hubName": "***",
"type": "AzureStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=***;AccountKey=**********;EndpointSuffix=core.windows.net"
}
}
数据集:
输入:
{
"name": "AzureBlobDatasetTemplate",
"properties": {
"published": false,
"type": "AzureBlob",
"linkedServiceName": "AzureStorageLinkedService",
"typeProperties": {
"folderPath": "app-insights/************/PageViews/{Slice}/{Hour}",
"format": {
"type": "JsonFormat"
},
"partitionedBy": [
{
"name": "Slice",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "yyyy-MM-dd"
}
},
{
"name": "Hour",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "HH"
}
}
]
},
"availability": {
"frequency": "Minute",
"interval": 15
},
"external": true,
"policy": {}
}
}
输出:
{
"name": "AzureTableDatasetTemplate",
"properties": {
"published": false,
"type": "AzureTable",
"linkedServiceName": "AzureStorageLinkedService",
"typeProperties": {
"tableName": "HelloWorld"
},
"availability": {
"frequency": "Minute",
"interval": 15
}
}
}
管道
{
"name": "PipelineTemplate",
"properties": {
"description": "Application Insight",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource"
},
"sink": {
"type": "AzureTableSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
}
},
"inputs": [
{
"name": "AzureBlobDatasetTemplate"
}
],
"outputs": [
{
"name": "AzureTableDatasetTemplate"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"retry": 3
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "CopyActivityTemplate"
}
],
"start": "2014-05-01T00:00:00Z",
"end": "2018-05-01T00:00:00Z",
"isPaused": false,
"hubName": "datafactorypocjspi_hub",
"pipelineMode": "Scheduled"
}
}
blob存储中的数据来自Application Insights持续导出。
我的目的是让整个设置像这样工作:
答案 0 :(得分:1)
我在为大型计划窗口配置时间片之前遇到过此问题...我认为您遇到了问题,因为 4的 15分钟时间片年窗口!
<强>数据集:强>
for file in os.listdir('C:\\Users\\####\\Documents\\Visual Studio 2015\\Projects\\Data\\'):
if fnmatch.fnmatch(file, '*.csv'):
scanReport = open(file)
scanReader = csv.reader(scanReport)
<强>活动:强>
"availability": {
"frequency": "Minute",
"interval": 15
配置ADF必须在部署时执行。结果是你看到它无法开始验证上游数据集,因为它仍然处理所有切片的创建。例如。等待!
这不是一个理想的答案,但我的建议是你将日程安排窗口缩小到更小的程度来测试复制过程。一旦工作,一次延长一个月,以便内部供应流程有机会赶上。
请注意;它不仅仅是15分钟除以4年。它也是每个数据集的两倍。输入和输出。
希望这有帮助。