数据工厂使用BOM将UTF-8编码为Datalake

时间:2019-09-30 08:48:48

标签: azure azure-data-lake azure-data-factory-2

我在Azure数据工厂中有一个管道,该管道执行REST API调用,并将其保存到Azure数据湖Gen2上的json对象。它始终使用BOM将文件编码为UTF-8。这会导致我的逻辑应用程序出现问题,我需要将文件编码为没有BOM的UTF-8。

在Data Factory中选择数据集上的编码时,我可以在UTF-8(默认)和UTF-8之间进行选择。这两个选项都将json文件另存为带有BOM的UTF-8。

我已经尝试过Base64ToString()将带有BOM表文件的UTF-8转换为字符串,但这仍然无法解析json。我猜有隐藏的BOM表字符吗?

这是Data Lake数据集的代码:

 {
    "name": "Datalake",
    "properties": {
        "linkedServiceName": {
            "referenceName": "Datalake",
            "type": "LinkedServiceReference"
        },
        "parameters": {
            "datasetfolder": {
                "type": "string",
                "defaultValue": "@pipeline().parameters.folder"
            }
        },
        "annotations": [],
        "type": "Json",
        "typeProperties": {
            "location": {
                "type": "AzureBlobFSLocation",
                "fileName": {
                    "value": "@concat(formatDateTime(utcnow()), '.json')",
                    "type": "Expression"
                },
                "folderPath": {
                    "value": "@concat(formatDateTime(utcNow(),'yyyy-MM-dd'))",
                    "type": "Expression"
                },
                "fileSystem": {
                    "value": "@dataset().datasetfolder",
                    "type": "Expression"
                }
            }
        }
    },
    "type": "Microsoft.DataFactory/factories/datasets"
}

这是管道的代码:

    "name": "2 - Microsoft365_AuditData_to_Datalake",
    "properties": {
        "activities": [
            {
                "name": "Copy Data1",
                "type": "Copy",
                "dependsOn": [],
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "source": {
                        "type": "RestSource",
                        "httpRequestTimeout": "00:01:40",
                        "requestInterval": "00.00:00:00.010",
                        "requestMethod": "GET",
                        "additionalHeaders": {
                            "Content-Type": "application/json; charset=utf-8"
                        }
                    },
                    "sink": {
                        "type": "JsonSink",
                        "storeSettings": {
                            "type": "AzureBlobFSWriteSettings",
                            "copyBehavior": "FlattenHierarchy"
                        },
                        "formatSettings": {
                            "type": "JsonWriteSettings",
                            "quoteAllText": true,
                            "filePattern": "arrayOfObjects"
                        }
                    },
                    "enableStaging": false
                },
                "inputs": [
                    {
                        "referenceName": "REST_Microsoft0365_Audit",
                        "type": "DatasetReference"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "Datalake",
                        "type": "DatasetReference",
                        "parameters": {
                            "datasetfolder": "@pipeline().parameters.folder"
                        }
                    }
                ]
            }
        ],
        "parameters": {
            "folder": {
                "type": "string",
                "defaultValue": "auditgeneral"
            }
        },
        "annotations": []
    },
    "type": "Microsoft.DataFactory/factories/pipelines"
}```

I expect the file to be encoded as UTF-8 Without BOM!

0 个答案:

没有答案