T-SQL中的HierarchyID聚合函数

时间:2012-01-17 20:38:28

标签: sql-server sql-server-2008 tsql hierarchyid

我被要求查询时间记录数据库,以显示给定项目完成的所有工作。每个项目都分为任务,每个任务本身都可以分解为任务。任务层次结构可以是任意数量的级别。部分要求是为层次结构中的每个任务或节点提供总工作时间(不仅是叶级节点,还包括所有节点,包括顶级项目节点,叶级节点和其间的所有节点)。

使用这样的层次结构我假设使用HIERARCHYID数据类型可能很有用。有没有办法在层次结构上执行类似于带ROLLUP的SUM,为层次结构中的每个节点提供小计?

我认为在层次结构上的这种聚合汇总是一个常见的要求,但我没有找到如何做到这一点,或者即使有可能也没有运气。

2 个答案:

答案 0 :(得分:2)

想出怎么做。这种方法有点复杂,也许其他人可以提出一个更整洁的版本。

该方法包括四个步骤:

  1. 对给定项目的所有任务运行ROW_NUMBER函数。通过ParentId进行分区,以便给定父级的所有子任务都编号为1,2,3,4等。这适用于任务层次结构的所有级别;

  2. 使用递归CTE(公用表表达式)将任务层次结构从叶级别向上移动到顶层。这将根据TimeCode表中的父子关系构建任务层次结构。最初我试图在这里包含ROW_NUMBER函数,但由于Microsoft实施CTE的方式不起作用;

  3. 将HIERARCHYID列添加到步骤2中构建的结构中;

  4. 在记录集上进行自联接以获取结构中每个节点的所有子节点。按父节点分组并将每个子节点记录的时间相加。请注意,HIERARCHYID方法IsDescendantOf不仅返回节点的子节点,还返回节点本身。因此,如果针对父任务以及子任务记录了任何时间,则它将包含在该父节点的总时间中。

  5. 这是脚本:

    -- Cannot include a ROW_NUMBER function within the recursive member of the 
    --    common table expression as SQL Server recurses depth first. ie SQL 
    --    Server recurses each row separately, completing the recursion for a 
    --    given row before starting the next.
    -- To get around this, use ROW_NUMBER outside the common table expression.
    
    DECLARE @tblTask TABLE (TimeCodeId INT, ParentId INT, ProjectID INT, 
        Level INT, TaskIndex VARCHAR(12), Duration FLOAT);
    
    INSERT INTO @tblTask (TimeCodeId, ParentId, ProjectID, 
        Level, TaskIndex, Duration)
    SELECT tc.TimeCodeId, 
        tc.ParentId, 
        CASE
            WHEN tc.ParentId IS NULL THEN tc.ReferenceId1
            ELSE tc.ReferenceId2
        END AS ProjectID, 
        1 AS Level, 
        CAST(ROW_NUMBER() OVER (PARTITION BY tc.ParentId 
                                ORDER BY tc.[Description]) AS VARCHAR(12)) 
                                                                AS TaskIndex, 
        ts.Duration            
    FROM Time.TimeCode tc 
        LEFT JOIN 
        (    -- Get time sub-totals for each task.
            SELECT TimeCodeId, 
                SUM(Duration) AS Duration
            FROM Time.Timesheet
            WHERE ReferenceId2 IN (12196, 12198)
            GROUP BY TimeCodeId
        ) ts
        ON tc.TimeCodeId = ts.TimeCodeId
    WHERE ReferenceId2 IN (12196, 12198)
    ORDER BY [Description];
    
    DECLARE @tblHierarchy TABLE (HierarchyNode HIERARCHYID, 
        Level INT, Duration FLOAT);
    
    -- Common table expression that builds up the task hierarchy recursively.
    WITH cte_task_hierarchy AS 
    (
        -- Anchor member.
        SELECT t.TimeCodeId,
            t.ParentID,  
            t.ProjectID, 
            t.Level, 
            CAST('/' + t.TaskIndex + '/' AS VARCHAR(200)) AS HierarchyNodeText, 
            t.Duration            
        FROM @tblTask t
    
        UNION ALL
    
        -- Dummy root node for HIERARCHYID.
        --    (easier to add it after another query so don't have to cast the 
        --    NULLs to data types)
        SELECT NULL AS TimeCodeId, 
            NULL AS ParentID, 
            NULL AS ProjectID, 
            0 AS Level, 
            CAST('/' AS VARCHAR(200)) AS HierarchyNodeText, 
            NULL AS Duration
    
        UNION ALL 
    
        -- Recursive member that walks up the task hierarchy.
        SELECT tp.TimeCodeId, 
            tp.ParentID,  
            th.ProjectID, 
            th.Level + 1 AS Level, 
            CAST('/' + tp.TaskIndex + th.HierarchyNodeText AS VARCHAR(200)) 
                AS HierarchyNodeText,
            th.Duration
        FROM cte_task_hierarchy th 
            JOIN @tblTask tp ON th.ParentID = tp.TimeCodeId 
    )
    INSERT INTO @tblHierarchy (HierarchyNode, 
        Level, Duration)
    SELECT hierarchyid::Parse(cth.HierarchyNodeText), 
        cth.Level, cth.Duration
    FROM cte_task_hierarchy cth 
    -- This filters recordset to exclude intermediate steps in the recursion 
    --    - only want the final result.
    WHERE cth.ParentId IS NULL
    ORDER BY cth.HierarchyNodeText;
    
    -- Show the task hierarchy.
    SELECT *, HierarchyNode.ToString() AS NodeText
    FROM @tblHierarchy;
    
    -- Calculate the sub-totals for each task in the hierarchy.
    SELECT t1.HierarchyNode.ToString() AS NodeText, 
        COALESCE(SUM(t2.Duration), 0) AS DurationTotal
    FROM @tblHierarchy t1 
        JOIN @tblHierarchy t2 
            ON t2.HierarchyNode.IsDescendantOf(t1.HierarchyNode) = 1
    GROUP BY t1.HierarchyNode;
    

    结果:

    First Recordset(具有HIERARCHYID列的任务结构):

    HierarchyNode    Level    Duration    NodeText
    -------------    -----   --------    --------
    0x               0        NULL       /
    0x58             1        NULL       /1/
    0x5AC0           2        12.15      /1/1/
    0x5AD6           3        8.92       /1/1/1/
    0x5ADA           3        11.08      /1/1/2/
    0x5ADE           3        7          /1/1/3/
    0x5B40           2        182.18     /1/2/
    0x5B56           3        233.71     /1/2/1/
    0x5B5A           3        227.27     /1/2/2/
    0x5BC0           2        45.4       /1/3/
    0x68             1        NULL       /2/
    0x6AC0           2        8.5        /2/1/
    0x6B40           2        2.17       /2/2/
    0x6BC0           2        8.91       /2/3/
    0x6C20           2        1.75       /2/4/
    0x6C60           2        60.25      /2/5/
    

    第二个记录集(每个任务的子总计任务):

    NodeText    DurationTotal
    --------    -------------
    /            809.29
    /1/          727.71
    /1/1/        39.15
    /1/1/1/      8.92
    /1/1/2/      11.08
    /1/1/3/      7
    /1/2/        643.16
    /1/2/1/      233.71
    /1/2/2/      227.27
    /1/3/        45.4
    /2/          81.58
    /2/1/        8.5
    /2/2/        2.17
    /2/3/        8.91
    /2/4/        1.75
    /2/5/        60.25
    

答案 1 :(得分:2)

这是我尝试的东西,它运作正常。在这种情况下,我有表分类法,其中id和ParentTaxonomyID指向ID。在这个存储过程中,我想计算与分类法相关的相关问题的数量 - 但我想通过层次结构来总结它们。这是我使用的存储过程

ALTER FUNCTION [dbo].[func_NumberOfQuestions](  
@TaxonomyID INT )
RETURNS INT
AS
BEGIN

DECLARE @NChildren INT
SELECT  @NChildren = dbo.func_NumberOfTaxonomyChildren(@TaxonomyID)

DECLARE @NumberOfQuestions INT, @NumberOfDirectQuestions INT, 
    @NumberOfChildQuestions INT 

SELECT  @NumberOfDirectQuestions = COUNT(*) 
FROM    ProblemTaxonomies
WHERE   TaxonomyID = @TaxonomyID

SELECT @NumberOfChildQuestions = 0
IF @NChildren > 0
BEGIN
SELECT  @NumberOfChildQuestions = 
        ISNULL(SUM(dbo.func_NumberOfQuestions(id)), 0)
FROM    Taxonomies
WHERE   ParentTaxonomyID = @TaxonomyID
END

RETURN @NumberOfDirectQuestions + @NumberOfChildQuestions
END

我在T-SQL中使用了一个函数,这应该是非常明显的递归调用 - 但是使用SQL我能够为子函数使用SUM函数