并行会话加载异步数据

时间:2018-02-20 04:39:21

标签: postgresql plpgsql

寻求有关数据加载功能的帮助,该功能旨在支持并行会话的异步执行。 Process_Log表包含数据加载函数列表,包含当前状态和上游依赖项列表。 每个会话首先查找准备执行,调用它和更新状态的函数。 有关详细信息,请参阅代码中的注释。

在Oracle PL / SQL中,我将其设计为循环中的嵌套块,以及用于状态更新的自治事务。 不知道如何在Postgres中实现这一目标。运行9.2。

CREATE OR REPLACE FUNCTION dm_operations.dm_load()
  RETURNS void AS
$BODY$

declare
    _run_cnt integer;
    _ready_cnt integer;
    _process_id dm_operations.process_log.process_id%type;
    _exec_name dm_operations.process_log.exec_name%type;
    _rowcnt dm_operations.process_log.rows_affected%type;
    _error text;
    _error_text text;
    _error_detail text;
    _error_hint text;
    _error_context text;

begin

  loop

--(1) Find one function ready to run

    select sum(case when process_status = 'RUNNING' then 1 else 0 end) run_cnt,
           sum(case when process_status = 'READY'  then 1 else 0 end) ready_cnt,
           min(case when  process_status = 'READY' then process_id end) process_id
    into _run_cnt, _ready_cnt, _process_id
    from dm_operations.process_log; --One row per each executable data load function

--(2) Exit loop if nothing is ready

    if _ready_cnt = 0 then exit;
    else

--(3) Lock the row until the status is updated

    select exec_name
    into _exec_name
    from dm_operations.process_log
    where process_id = _process_id
    for update;

--(4) Set status of the function to 'RUNNING'
--New status must be visible to other sessions

    update dm_operations.process_log
    set process_status = 'RUNNING',
        start_ts = now()
    where process_id = _process_id;

--(5) Release lock. (How?)

--(6) Execute data load function. See example below.
-- Is this correct syntax for dynamic call to a function that returns void?

    execute 'perform dm_operations.'||_exec_name;

--(7) Get number of rows processed by the data load function

    GET DIAGNOSTICS _rowcnt := ROW_COUNT;

--(8) Upon successful function execution set status to 'SUCCESS'

    update dm_operations.process_log
    set process_status = 'SUCCESS',
        end_ts = now(),
        rows_affected = _rowcnt
    where process_id = _process_id;

--(9) Check dependencies and update status
--These changes must be visible to the next loop iteration, and to other sessions

    update dm_operations.process_log pl1 
    set process_status = 'READY'
    where process_status is null
    and not exists (select null from dm_operations.process_log pl2
    where pl2.process_id in (select unnest(pl1.depends_on))
    and (coalesce(pl2.process_status,'NULL') <> 'SUCCESS'));


    end if;

--(10) Log error and allow the loop to continue

    EXCEPTION
    when others then
      GET STACKED DIAGNOSTICS _error_text = MESSAGE_TEXT,
                              _error_detail = PG_EXCEPTION_DETAIL,
                              _error_hint = PG_EXCEPTION_HINT,
                              _error_context = PG_EXCEPTION_CONTEXT;
      _error := _error_text||
                _error_detail||
                _error_hint||
                _error_context;

    update dm_operations.process_log
    set process_status = 'ERROR',
        start_ts = now(),
        rows_affected = _rowcnt,
        error_text = _error
    where process_id = _process_id;

    end;

  end loop;

end;

$BODY$
  LANGUAGE plpgsql;

数据加载功能示例(6):

CREATE OR REPLACE FUNCTION load_target()
  RETURNS void AS
$BODY$

begin

execute 'truncate table target_table';

insert into target_table 
select ...
from source_table;

end;
$BODY$
  LANGUAGE plpgsql;

1 个答案:

答案 0 :(得分:0)

您无法在PL / pgSQL中启动异步操作。

我能想到两个选择:

  1. 困难之处:升级到更新的PostgreSQL版本并在C中编写background worker执行load_target。您必须使用

  2. 不要在数据库中编写您的函数,但在客户端。然后,您只需打开几个数据库会话并以这种方式并行运行函数。