我有一个REST API,需要从Azure数据工厂调用并将数据插入到SQL表中。
从API返回的JSON格式为以下格式:
{
"serviceResponse": {
"supportOffice": "EUKO",
"totalPages": 5,
"pageNo": 1,
"recordsPerPage": 1000,
"projects": [
{ "projectID":1 ...} , { "projectID":2 ...} ,...
]
}
}
URL的格式 http://server.com/api/Projects?pageNo=1
我设法设置了一个RestService来调用API,并返回JSON和一个将接收JSON并将其传递给存储数据的存储过程的SQL Sink。
但是,我正在努力解决的是如何处理分页。
我尝试过:
分页选项:我认为这不会起作用,因为它仅允许返回完整的下一个URL的XPATH。我看不到它将允许从totalPages和pageNo计算URL。 (或者我无法正常工作)
我尝试在处理之前向API添加Web调用,然后计算页面数。尽管效果并不理想,但直到我达到1mb / 1min的限制时为止,因为一些响应很大。这是行不通的。
我尝试查看API是否可以更改,但这是不可能的。
我想知道是否有人对如何实现此功能有任何想法,或者成功使用了类似的API?
答案 0 :(得分:0)
以下说明将逐步创建一个类似于以下内容的管道。注意,它使用存储过程活动,Web活动和每个活动。
首先按照here的说明,配置Azure SQL DB,设置AAD管理员,然后在数据库中授予ADF MSI权限。然后创建下表和两个存储过程:
CREATE TABLE [dbo].[People](
[id] [int] NULL,
[email] [varchar](255) NULL,
[first_name] [varchar](100) NULL,
[last_name] [varchar](100) NULL,
[avatar] [nvarchar](1000) NULL
)
GO
/*
sample call:
exec uspInsertPeople @json = '{"page":1,"per_page":3,"total":12,"total_pages":4,"data":[{"id":1,"email":"george.bluth@reqres.in","first_name":"George","last_name":"Bluth","avatar":"https://s3.amazonaws.com/uifaces/faces/twitter/calebogden/128.jpg"},{"id":2,"email":"janet.weaver@reqres.in","first_name":"Janet","last_name":"Weaver","avatar":"https://s3.amazonaws.com/uifaces/faces/twitter/josephstein/128.jpg"},{"id":3,"email":"emma.wong@reqres.in","first_name":"Emma","last_name":"Wong","avatar":"https://s3.amazonaws.com/uifaces/faces/twitter/olegpogodaev/128.jpg"}]}'
*/
create proc uspInsertPeople @json nvarchar(max)
as
begin
insert into People (id, email, first_name, last_name, avatar)
select d.*
from OPENJSON(@json)
WITH (
[data] nvarchar(max) '$.data' as JSON
)
CROSS APPLY OPENJSON([data], '$')
WITH (
id int '$.id',
email varchar(255) '$.email',
first_name varchar(100) '$.first_name',
last_name varchar(100) '$.last_name',
avatar nvarchar(1000) '$.avatar'
) d;
end
GO
create proc uspTruncatePeople
as
truncate table People
接下来,在Azure Data Factory v2中创建一个新管道,将其重命名为ForEachPage,然后转到“代码”视图并粘贴以下JSON:
{
"name": "ForEachPage",
"properties": {
"activities": [
{
"name": "GetTotalPages",
"type": "WebActivity",
"dependsOn": [
{
"activity": "Truncate SQL Table",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"url": {
"value": "https://reqres.in/api/users?page=1",
"type": "Expression"
},
"method": "GET"
}
},
{
"name": "ForEachPage",
"type": "ForEach",
"dependsOn": [
{
"activity": "GetTotalPages",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "@range(1,activity('GetTotalPages').output.total_pages)",
"type": "Expression"
},
"activities": [
{
"name": "GetPage",
"type": "WebActivity",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"url": {
"value": "@concat('https://reqres.in/api/users?page=',item())",
"type": "Expression"
},
"method": "GET"
}
},
{
"name": "uspInsertPeople Sproc",
"type": "SqlServerStoredProcedure",
"dependsOn": [
{
"activity": "GetPage",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"storedProcedureName": "[dbo].[uspInsertPeople]",
"storedProcedureParameters": {
"json": {
"value": {
"value": "@string(activity('GetPage').output)",
"type": "Expression"
},
"type": "String"
}
}
},
"linkedServiceName": {
"referenceName": "lsAzureDB",
"type": "LinkedServiceReference"
}
}
]
}
},
{
"name": "Truncate SQL Table",
"type": "SqlServerStoredProcedure",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"storedProcedureName": "[dbo].[uspTruncatePeople]"
},
"linkedServiceName": {
"referenceName": "lsAzureDB",
"type": "LinkedServiceReference"
}
}
],
"annotations": []
}
}
创建与Azure SQL DB的lsAzureDB链接服务,将其设置为使用MSI进行身份验证。
此管道调用sample paged API(目前可以运行,但我管理的API尚不可用,因此可能会在某个时候停止工作),以演示如何循环以及如何获取Web活动的结果并插入它们通过存储过程调用和存储过程中的JSON解析到SQL表。该循环将以并行方式运行,但是当然您可以更改ForEachPage活动的设置以使其以串行方式运行。
答案 1 :(得分:0)
此方法由于多种原因而无法使用,但是主要问题是管道“复制数据”活动无法索引到深度嵌套的数组中。
我可以对数组的第一级进行通配,但更深入的要求和实际的整数索引值。只要数组中只有一项,那之后就很好了,但是我们会丢失数据。
{
"source": {
"path": "$['myObject']['element'][*]['externalUID'][0]['provider']"
},
sink": {
name": "EXTERNALUID_PROVIDER"
}
},