计算T-SQL表中的成对出现次数

时间:2015-10-11 07:09:33

标签: sql sql-server tsql

如何计算SQL Server表中的成对出现次数?请注意,必须考虑给定序列的顺序,不应更改。

原始表:

    1 2 3 4
   --------
1 | A A A B
2 | A       # don't count
3 | B A A
4 | B       # don't count

结果:

1 | AA = 3
2 | AB = 1
3 | BB = 0
4 | BA = 1

此外,代码必须适用于大型数据集。

修改

此上下文中的一对是一组两个值{x [ij],x [(i + 1)j]},其中i = 1,...,4和j = 1,..., 4。此外,不应计算具有A nullB null形式的对。此外,null Anull B不可能发生,因此无需考虑它们。

3 个答案:

答案 0 :(得分:2)

我只是想指出表达这个逻辑的一种非常简单的方法:

with vals as (
      select 'A' as val union all select 'B'
     )
     pairs as (
      select t1.val as val1, t2.val as val2
      from vals t1 cross join vals t2
    )
select p.*,
       (select count(*)
        from original
        where [1] = val1 and [2] = val2 or
              [2] = val1 and [3] = val2 or
              [3] = val1 and [4] = val2
       ) as cnt
from pairs p
order by cnt desc;

这没有很好的性能特征,实际上很容易通过在数据列上使用三个子查询和索引来修复。

答案 1 :(得分:1)

<强> LiveDemo

CREATE TABLE #tab([1] NVARCHAR(100), [2] NVARCHAR(100),
                  [3]  NVARCHAR(100), [4] NVARCHAR(100));

INSERT INTO #tab
VALUES ('A', 'A', 'A', 'B') ,('A' , NULL ,NULL ,NULL  )   
      ,('B' ,'A' ,'A', NULL),('B',  NULL, NULL, NULL);

WITH cte AS
(
  SELECT pair = [1] + [2] FROM #tab
  UNION ALL
  SELECT pair = [2] + [3] FROM #tab
  UNION ALL
  SELECT pair = [3] + [4] FROM #tab   
), cte2 AS
(
  SELECT [1] AS val FROM #tab
  UNION ALL SELECT [2] FROM #tab
  UNION ALL SELECT [3] FROM #tab
  UNION ALL SELECT [4] FROM #tab
), all_pairs AS
(
  SELECT DISTINCT a.val + b.val AS pair
  FROM cte2 a
  CROSS JOIN cte2 b
  WHERE a.val IS NOT NULL and b.val IS NOT NULL
)
SELECT a.pair, result = COUNT(c.pair)
FROM all_pairs a
LEFT JOIN cte c
  ON a.pair = c.pair
GROUP BY a.pair;

工作原理:

  1. cte创建所有对(1,2),(2,3),(3,4)
  2. cte2获取列
  3. 中的所有值
  4. all_pairs创建所有可能的值对AA, AB, BA, BB
  5. 最终使用分组和COUNT以获取出现次数。
  6. 修改

    您可以将结果连接如下:

    <强> LiveDemo2

    ...
    , final AS
    (
    SELECT a.pair, result = COUNT(c.pair), rn = ROW_NUMBER() OVER(ORDER BY a.pair)
    FROM all_pairs a
    LEFT JOIN cte c
      ON a.pair = c.pair
    GROUP BY a.pair
    )
    SELECT rn, [result] = pair + ' = ' + CAST(result AS NVARCHAR(100))
    FROM final
    

答案 2 :(得分:1)

with cte as (
    select 1 as id, 'A' as [1], 'A' as [2], 'A' as [3], 'B' as [4]
    union all select 2 , 'A', NULL,NULL,NULL
    union all select 3 , 'B', 'A','A',NULL
    union all select 4 , 'B',NULL,NULL,NULL
    )
    , Vals as (
        select 'AA' as Val
        union all select 'AB' 
        union all select 'BB'
        union all select 'BA'
    )
    , UNPVT as (
        /*UNPIVOT to convert the columns to be rows*/
        SELECT id , VAL + LEAD(VAL) OVER (PARTITION BY ID ORDER BY SEQ) as Code
        FROM (
        select ID,[1],[2],[3],[4] from cte
        ) P 
        UNPIVOT (Val FOR Seq IN ([1],[2],[3],[4])
        ) AS UNPVT
    )
    select Vals.Val, count(UNPVT.Code)  from UNPVT right join Vals on UNPVT.Code = Vals.Val
    group by Vals.Val

CTE:包含您的数据。 Vals:包含返回的代码。 UnPVT:将列转换为行。