我正在寻找一种解决方案,将SQL DW DMV中的数据从2个不同的数据库加载到一个SQL DW表上的单个表中。
我选择了ADF管道活动 - 这有助于每15分钟加载一次数据,但我看到了一个问题 - 当我在一个管道中创建两个活动时,它有两个不同的源(输入数据集),但两者都有将数据加载到同一目标(输出数据集)。 我还想确保 - 我在活动之间建立依赖关系,以便它们不会同时运行。活动2仅在活动1完成/未运行后启动。
我的ADF代码如下:
{
"name": "Execution_Requests_Hist",
"properties": {
"description": "Execution Requests history data",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlDWSource",
"sqlReaderQuery": "select * from dm_pdw_exec_requests_hist_view"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "request_id:request_id,session_id:session_id,status:status,submit_time:submit_time,start_time:start_time,end_compile_time:end_compile_time,total_elapsed_time:total_elapsed_time,end_time:end_time,label:label,error_id:error_id,command:command,resource_class:resource_class,database_id:database_id,login_name:login_name,app_name:app_name,client_id:client_id,DMV_Source:DMV_Source,source:source,type:type,create_time:create_time,details:details"
},
"enableSkipIncompatibleRow": true
},
"inputs": [
{
"name": "ID_Exec_Requests"
}
],
"outputs": [
{
"name": "OD_Exec_Requests"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "PRD_DMV_Load"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlDWSource",
"sqlReaderQuery": "select * from dm_pdw_exec_requests_hist_view"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "request_id:request_id,session_id:session_id,status:status,submit_time:submit_time,start_time:start_time,end_compile_time:end_compile_time,total_elapsed_time:total_elapsed_time,end_time:end_time,label:label,error_id:error_id,command:command,resource_class:resource_class,database_id:database_id,login_name:login_name,app_name:app_name,client_id:client_id,DMV_Source:DMV_Source,source:source,type:type,create_time:create_time,details:details"
},
"enableSkipIncompatibleRow": true
},
"inputs": [
{
"name": "OD_Exec_Requests",
"name": "ITG_Exec_Requests"
}
],
"outputs": [
{
"name": "OD_Exec_Requests"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "ITG_DMV_Load"
}
],
"start": "2017-08-20T04:22:00Z",
"end": "2018-08-20T04:22:00Z",
"isPaused": false,
"hubName": "xyz-adf_hub",
"pipelineMode": "Scheduled"
}
}
当我尝试部署它时 - 它给出了以下错误消息:
错误活动' PRD_DMV_Load'和' ITG_DMV_Load'有同样的 输出数据集' OD_Exec_Requests'。两个活动无法输出 同一数据集在同一活动期间。
我该如何解决这个问题?我可以说 - 只有在PRD_DMV_Load完成后运行ITG_DMV_Load?
答案 0 :(得分:0)
这里有两个问题。
您无法从两个不同的活动/管道生成相同的数据集切片。要解决此问题,您可以创建另一个数据集,该数据集将指向同一个表,但从ADF角度来看,这将是不同的接收器。您还需要将第二个活动移动到单独的管道配置中(因此每个管道最终会有一个活动)。
您需要以某种方式订购管道。我看到两种可能的方法:
您可以尝试使用调度程序配置选项 - e.q。您可以使用offset
属性(或style
)在间隔中间安排一个管道:
例如,如果第一个管道配置如下:
"scheduler": {
"frequency": "Minute",
"interval": 15
},
像这样配置第二个:
"scheduler": {
"frequency": "Minute",
"interval": 15,
"offset" : 5
},
这种方法可能需要进行一些调整,具体取决于管道完成需要多长时间。
另一种方法是将第一个管道的输出指定为第二个管道的输入。在这种情况下,第二个活动将在第一个活动完成之前开始。在这种情况下,活动时间表必须匹配(即两者应具有相同的scheduler.frequency
和scheduler.interval
)。
答案 1 :(得分:0)
正如@arghtype所说,您无法在两个活动管道或活动中使用相同的ADF数据集。您需要为ITG_DMV_Load创建第二个相同的输出数据集,但不必拆分管道。通过将第一个辅助输入的输出输出到第二个,可以确保第二个活动在第一个活动完成之前不会运行。我会建议像这样......
{
"name": "Execution_Requests_Hist",
"properties": {
"description": "Execution Requests history data",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlDWSource",
"sqlReaderQuery": "select * from dm_pdw_exec_requests_hist_view"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "request_id:request_id,session_id:session_id,status:status,submit_time:submit_time,start_time:start_time,end_compile_time:end_compile_time,total_elapsed_time:total_elapsed_time,end_time:end_time,label:label,error_id:error_id,command:command,resource_class:resource_class,database_id:database_id,login_name:login_name,app_name:app_name,client_id:client_id,DMV_Source:DMV_Source,source:source,type:type,create_time:create_time,details:details"
},
"enableSkipIncompatibleRow": true
},
"inputs": [
{
"name": "ID_Exec_Requests"
}
],
"outputs": [
{
"name": "OD_Exec_Requests_PRD"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "PRD_DMV_Load"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlDWSource",
"sqlReaderQuery": "select * from dm_pdw_exec_requests_hist_view"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "request_id:request_id,session_id:session_id,status:status,submit_time:submit_time,start_time:start_time,end_compile_time:end_compile_time,total_elapsed_time:total_elapsed_time,end_time:end_time,label:label,error_id:error_id,command:command,resource_class:resource_class,database_id:database_id,login_name:login_name,app_name:app_name,client_id:client_id,DMV_Source:DMV_Source,source:source,type:type,create_time:create_time,details:details"
},
"enableSkipIncompatibleRow": true
},
"inputs": [
{
"name": "ITG_Exec_Requests",
"name": "OD_Exec_Requests_PRD"
}
],
"outputs": [
{
"name": "OD_Exec_Requests_ITG"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "ITG_DMV_Load"
}
],
"start": "2017-08-20T04:22:00Z",
"end": "2018-08-20T04:22:00Z",
"isPaused": false,
"hubName": "xyz-adf_hub",
"pipelineMode": "Scheduled"
}