尝试在特定条件下将行合并为一行

时间:2012-04-19 21:41:10

标签: sql sql-server tsql sql-server-2008-r2

如果选择要合并的2行或更多行,则将其中一行标识为模板行。其他行应将其数据合并到模板具有的任何空值列中。

示例数据:

Id  Name     Address          City          State   Active  Email             Date
1   Acme1    NULL             NULL          NULL    NULL    blah@yada.com     3/1/2011
2   Acme1    1234 Abc Rd      Springfield   OR      0       blah@gmail.com    1/12/2012
3   Acme2    NULL             NULL          NULL    1       blah@yahoo.com    4/19/2012

假设用户选择了Id 1作为模板行的行,而Ids 2和3的行将合并到行1中然后删除。行Id 1中的任何空值列应填充(如果存在)最近的(请参见日期列)非空值,并且行Id 1中已存在的非空值将保持原样。对上述数据的查询结果应该是这样的:

Id  Name     Address          City          State   Active  Email             Date
1   Acme1    1234 Abc Road    Springfield   OR      1       blah@yada.com     3/1/2011

请注意,Active值为1,而不是0,因为行Id 3具有最新日期。

P.S。另外,有没有办法在没有事先明确定义/知道所有列名称的情况下做到这一点?我正在使用的实际表格有很多列,新的列一直在添加。有没有办法查找表中的所有列名,然后使用该子查询或temptable来完成这项工作?

4 个答案:

答案 0 :(得分:2)

您可以先通过模板标志排序行,然后按日期desc排序。模板行应始终是最后一行。每行按顺序分配一个数字。使用max()我们发现拳头被占用的单元格(按数字的降序排列)。然后我们从与这些最大值匹配的行中选择列。

; with rows as (
    select test.*,
  -- Template row must be last - how do you decide which one is template row?
  -- In this case template row is the one with id = 1
    row_number() over (order by case when id = 1 then 1 else 0 end,
                       date) rn
    from test
  -- Your list of rows to merge goes here
  -- where id in ( ... )
),
-- Finding first occupied row per column
positions as (
  select
    max (case when Name is not null then rn else 0 end) NamePosition,
    max (case when Address is not null then rn else 0 end) AddressPosition,
    max (case when City is not null then rn else 0 end) CityPosition,
    max (case when State is not null then rn else 0 end) StatePosition,
    max (case when Active is not null then rn else 0 end) ActivePosition,
    max (case when Email is not null then rn else 0 end) EmailPosition,
    max (case when Date is not null then rn else 0 end) DatePosition
  from rows
)
-- Finally join this columns in one row
select 
  (select Name from rows cross join Positions where rn = NamePosition) name,
  (select Address from rows cross join Positions where rn = AddressPosition) Address,
  (select City from rows cross join Positions where rn = CityPosition) City,
  (select State from rows cross join Positions where rn = StatePosition) State,
  (select Active from rows cross join Positions where rn = ActivePosition) Active,
  (select Email from rows cross join Positions where rn = EmailPosition) Email,
  (select Date from rows cross join Positions where rn = DatePosition) Date
from test
-- Any id will suffice, or even DISTINCT
where id = 1

You might check it at Sql Fiddle

编辑:

最后一节中的交叉连接实际上可能是rows.rn = xxxPosition上的内部连接。它以这种方式工作,但改为内连接将是一种改进。

答案 1 :(得分:1)

这不是那么复杂。

起初.. DECLARE @templateID INT = 1 ..所以你可以记住哪一行被视为模板..

现在找到最新的NOT NULL值(排除模板行)。最简单的方法是为每列使用TOP 1子查询:

SELECT
(SELECT TOP 1 Name FROM DataTab WHERE Name IS NOT NULL AND NOT ID = @templateID ORDER BY Date DESC) AS LatestName,
(SELECT TOP 1 Address FROM DataTab WHERE Address IS NOT NULL AND NOT ID = @templateID ORDER BY Date DESC) AS AddressName
-- add more columns here

将上面包含在CTE(公用表格表达式)中,以便为UDPATE提供良好的输入..

WITH Latest_CTE (CTE_LatestName, CTE_AddressName) -- add more columns here; I like CTE prefix to distinguish source columns from target columns..
AS
-- Define the CTE query.
(
    SELECT
    (SELECT TOP 1 Name FROM DataTab WHERE Name IS NOT NULL AND NOT ID = @templateID ORDER BY Date DESC) AS LatestName,
    (SELECT TOP 1 Address FROM DataTab WHERE Address IS NOT NULL AND NOT ID = @templateID ORDER BY Date DESC) AS AddressName
    -- add more columns here
)
UPDATE
<update statement here (below)>

现在,使用UPDATE执行模板行的智能ISNULL - 它将充当条件更新 - 仅在目标列为空时更新

WITH
<common expression statement here (above)>
UPDATE DataTab
SET 
Name = ISNULL(Name, CTE_LatestName), -- if Name is null then set Name to CTE_LatestName else keep Name as Name
Address = ISNULL(Address, CTE_LatestAddress)
-- add more columns here..
WHERE ID = @templateID

最后一项任务是删除除模板行之外的行..

DELETE FROM DataTab WHERE NOT ID = @templateID

清除?

答案 2 :(得分:1)

对于动态列,您需要使用动态SQL编写解决方案。

您可以查询sys.columns和sys.tables以获取所需的列列表,然后您希望为每个空列向后循环一次,找到该列的第一个非空行并更新输出行柱。一旦你在循环中得到0,你就有一个完整的行,然后你可以显示给用户。

答案 3 :(得分:1)

我应该注意发布日期。无论如何,这是一个使用动态SQL构建更新语句的解决方案。无论如何,它应该给你一些东西。

在那里有一些额外的代码来验证结果,但我尝试以一种使非重要代码显而易见的方式发表评论。

CREATE TABLE 
dbo.Dummy 
    (
    [ID] int ,
    [Name] varchar(30),
    [Address] varchar(40) null,
    [City]  varchar(30) NULL,
    [State] varchar(2) NULL,
    [Active] tinyint NULL,
    [Email] varchar(30) NULL,
    [Date] date NULL
    );
--
INSERT dbo.Dummy
VALUES
(
    1, 'Acme1', NULL, NULL, NULL, NULL, 'blah@yada.com', '3/1/2011'
)
,
(
    2, 'Acme1', '1234 Abc Rd', 'Springfield', 'OR', 0, 'blah@gmail.com', '1/12/2012'
)
,
(
    3, 'Acme2', NULL, NULL, NULL, 1, 'blah@yahoo.com', '4/19/2012'
);
DECLARE 
    @TableName nvarchar(128) = 'Dummy',
    @TemplateID int = 1,
    @SetStmtList nvarchar(max) = '',
    @LoopCounter int = 0,
    @ColumnCount int = 0,
    @SQL nvarchar(max) = ''
    ;
--
--Create a table to hold the column names
DECLARE     
    @ColumnList table 
        (
        ColumnID tinyint IDENTITY,
        ColumnName nvarchar(128)
        );
--
--Get the column names
INSERT @ColumnList
(
    ColumnName
)
    SELECT
        c.name
    FROM
        sys.columns AS c
        JOIN
        sys.tables AS t
            ON
                t.object_id = c.object_id
    WHERE
        t.name = @TableName;
--
--Create loop boundaries to build out the SQL statement
SELECT
    @ColumnCount = MAX( l.ColumnID ),
    @LoopCounter = MIN (l.ColumnID )
FROM
    @ColumnList AS l;
--
--Loop over the column names
WHILE @LoopCounter <= @ColumnCount
BEGIN
    --Dynamically construct SET statements for each column except ID (See the WHERE clause)
    SELECT 
        @SetStmtList = @SetStmtList + ',' + l.ColumnName + ' =COALESCE(' + l.ColumnName + ', (SELECT TOP 1 ' + l.ColumnName + ' FROM ' + @TableName + ' WHERE ' + l.ColumnName + ' IS NOT NULL AND ID <> ' + CAST(@TemplateID AS NVARCHAR(MAX )) + ' ORDER BY Date DESC)) '
    FROM 
        @ColumnList AS l
    WHERE 
        l.ColumnID = @LoopCounter
        AND
        l.ColumnName <> 'ID';
--
    SELECT
        @LoopCounter = @LoopCounter + 1;
--
END;

--TESTING - Validate the initial table values
SELECT * FROM dbo.Dummy ;
--
--Get rid of the leading common in the SetStmtList
SET @SetStmtList = SUBSTRING( @SetStmtList, 2, LEN( @SetStmtList ) - 1 );
--Build out the rest of the UPDATE statement
SET @SQL = 'UPDATE ' + @TableName  + ' SET ' + @SetStmtList + ' WHERE ID = ' + CAST(@TemplateID AS NVARCHAR(MAX ))
--Then execute the update
EXEC sys.sp_executesql
    @SQL;
--
--TESTING - Validate the updated table values
SELECT * FROM dbo.Dummy ;
--
--Build out the DELETE statement
SET @SQL = 'DELETE FROM ' + @TableName + ' WHERE ID <> ' + CAST(@TemplateID AS NVARCHAR(MAX ))
--Execute the DELETE
EXEC sys.sp_executesql
    @SQL;
--
--TESTING - Validate the final table values
SELECT * FROM dbo.Dummy; 
--
DROP TABLE dbo.Dummy;