我在Azure数据工厂中有一个管道,该管道执行REST API调用,并将其保存到Azure数据湖Gen2上的json对象。它始终使用BOM将文件编码为UTF-8。这会导致我的逻辑应用程序出现问题,我需要将文件编码为没有BOM的UTF-8。
在Data Factory中选择数据集上的编码时,我可以在UTF-8(默认)和UTF-8之间进行选择。这两个选项都将json文件另存为带有BOM的UTF-8。
我已经尝试过Base64ToString()将带有BOM表文件的UTF-8转换为字符串,但这仍然无法解析json。我猜有隐藏的BOM表字符吗?
这是Data Lake数据集的代码:
{
"name": "Datalake",
"properties": {
"linkedServiceName": {
"referenceName": "Datalake",
"type": "LinkedServiceReference"
},
"parameters": {
"datasetfolder": {
"type": "string",
"defaultValue": "@pipeline().parameters.folder"
}
},
"annotations": [],
"type": "Json",
"typeProperties": {
"location": {
"type": "AzureBlobFSLocation",
"fileName": {
"value": "@concat(formatDateTime(utcnow()), '.json')",
"type": "Expression"
},
"folderPath": {
"value": "@concat(formatDateTime(utcNow(),'yyyy-MM-dd'))",
"type": "Expression"
},
"fileSystem": {
"value": "@dataset().datasetfolder",
"type": "Expression"
}
}
}
},
"type": "Microsoft.DataFactory/factories/datasets"
}
这是管道的代码:
"name": "2 - Microsoft365_AuditData_to_Datalake",
"properties": {
"activities": [
{
"name": "Copy Data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "RestSource",
"httpRequestTimeout": "00:01:40",
"requestInterval": "00.00:00:00.010",
"requestMethod": "GET",
"additionalHeaders": {
"Content-Type": "application/json; charset=utf-8"
}
},
"sink": {
"type": "JsonSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings",
"copyBehavior": "FlattenHierarchy"
},
"formatSettings": {
"type": "JsonWriteSettings",
"quoteAllText": true,
"filePattern": "arrayOfObjects"
}
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "REST_Microsoft0365_Audit",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "Datalake",
"type": "DatasetReference",
"parameters": {
"datasetfolder": "@pipeline().parameters.folder"
}
}
]
}
],
"parameters": {
"folder": {
"type": "string",
"defaultValue": "auditgeneral"
}
},
"annotations": []
},
"type": "Microsoft.DataFactory/factories/pipelines"
}```
I expect the file to be encoded as UTF-8 Without BOM!