Question

我有一个复制作业，应该在两个Azure DataLake之间复制100 GB的excel文件。

 "properties": {
        "activities": [
            {
                "name": "Copy Data1",
                "type": "Copy",
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "typeProperties": {
                    "source": {
                        "type": "AzureDataLakeStoreSource",
                        "recursive": true,
                        "maxConcurrentConnections": 256
                    },
                    "sink": {
                        "type": "AzureDataLakeStoreSink",
                        "maxConcurrentConnections": 256
                    },
                    "enableStaging": false,
                    "parallelCopies": 32,
                    "dataIntegrationUnits": 256
                },
                "inputs": [
                    {
                        "referenceName": "SourceLake",
                        "type": "DatasetReference"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "DestLake",
                        "type": "DatasetReference"
                    }
                ]
            }
        ],

我的吞吐量约为4 MB / s。当我阅读here时，应该为56 MB / s。我应该怎么做才能达到此吞吐量？

Answer 1

您可以使用复制活动Performance tuning来帮助您通过复制活动来调整Azure Data Factory服务的性能。

摘要：

执行以下步骤，通过复制活动来调整Azure Data Factory服务的性能。

建立基线。在开发阶段，通过对代表性数据样本使用复制活动来测试管道。在复制活动监视之后收集执行细节和性能特征。
诊断和优化性能。如果您观察到的性能不符合您的期望，请确定性能瓶颈。然后，优化性能以消除或减少瓶颈的影响。

在某些情况下，当您在Azure数据工厂中运行复制活动时，您会在复制活动监视页面的顶部看到“性能调整提示”消息，如以下示例所示。该消息告诉您为给定的副本运行确定的瓶颈。它还指导您进行哪些更改以提高复印吞吐量。

您的文件约为100 GB。但是基于文件的存储的测试文件是多个10 GB的文件。性能可能有所不同。

希望这会有所帮助。

在Azure Data Factory中加快复制任务

1 个答案: