如何在特定列中找到零的平均值

时间:2018-01-10 14:35:05

标签: sql sql-server gaps-and-islands

我的情况是我有如下记录。

+-----------+---------+---+--------+
|     a     |    b    | c |   d    |
+-----------+---------+---+--------+
| 537196605 | HZA-LOC | 0 | 201701 |
| 537196605 | HZA-LOC | 0 | 201702 |
| 537196605 | HZA-LOC | 0 | 201703 |
| 537196605 | HZA-LOC | 0 | 201704 |
| 537196605 | HZA-LOC | 0 | 201705 |
| 537196605 | HZA-LOC | 2 | 201706 |
| 537196605 | HZA-LOC | 0 | 201707 |
| 537196605 | HZA-LOC | 4 | 201708 |
| 537196605 | HZA-LOC | 0 | 201709 |
| 537196605 | HZA-LOC | 0 | 201710 |
| 537196605 | HZA-LOC | 0 | 201711 |
| 537196605 | HZA-LOC | 0 | 201712 |
+-----------+---------+---+--------+

现在我想在这里计算零的平均值。

我的意思是我需要计算为计数(零)/总对零,即10/3 = 3.33

所有在一起的零被视为一个值。前5个值将计为1,第7个值计为1,9-12计为1,总计为3。 请注意,C列的值是根据A和B的分组生成的。 期望的结果应如下所示: -

a b h 537196605 HZA-LOC 3.33

请帮我提供一个通用的解决方案,因为我有数百万条这样的记录。

4 个答案:

答案 0 :(得分:1)

这是使用窗口函数的一种方法

SELECT [Avg] = (Count(*) * 1.0) / Sum(CASE WHEN prev_c = c THEN 0 ELSE 1 END) 
FROM   (SELECT *,
               prev_c = Lag(c)OVER(ORDER BY d)
        FROM   yourtable) a
WHERE  c = 0 

答案 1 :(得分:0)

您可以使用ROW_NUMBER()技巧来创建group_id值。

确定群组后,计算每个群组的成员数量和群组数量,并完成。

http://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=e10b07b6fd9f8bc7549a3b98aaff3af6

WITH
  enumerated_example AS
(
  SELECT
    ROW_NUMBER() OVER (PARTITION BY a, b    ORDER BY d)   AS row_id,
    ROW_NUMBER() OVER (PARTITION BY a, b, c ORDER BY d)   AS zero_row_id,
    *
  FROM
    example
),
  zero_groups AS
(
  SELECT
    row_id - zero_row_id   AS group_id,
    COUNT(*)               AS group_size
  FROM
    enumerated_example
  WHERE
    c = 0
  GROUP BY
    row_id - zero_row_id
)
SELECT
  CAST(SUM(group_size) AS DECIMAL(13, 8)) / COUNT(*)
FROM
  zero_groups

答案 2 :(得分:0)

一个不同的(但相似的)解决方案:

CREATE TABLE #Sample (a int,
                      b varchar(7),
                      c tinyint,
                      d int)

INSERT INTO #Sample
VALUES
    (537196605,'HZA-LOC',0,201701),
    (537196605,'HZA-LOC',0,201702),
    (537196605,'HZA-LOC',0,201703),
    (537196605,'HZA-LOC',0,201704),
    (537196605,'HZA-LOC',0,201705),
    (537196605,'HZA-LOC',2,201706),
    (537196605,'HZA-LOC',0,201707),
    (537196605,'HZA-LOC',4,201708),
    (537196605,'HZA-LOC',0,201709),
    (537196605,'HZA-LOC',0,201710),
    (537196605,'HZA-LOC',0,201711),
    (537196605,'HZA-LOC',0,201712);
GO

WITH Groups AS(
    SELECT *,
           ROW_NUMBER() OVER (PARTITION BY a, b ORDER BY d) -
           ROW_NUMBER() OVER (PARTITION BY a, b, CASE c WHEN 0 THEN 0 ELSE 1 END ORDER BY d) AS Grp
    FROM #Sample)
SELECT a, b,
       CONVERT(decimal(6,2),COUNT(*)) / COUNT(DISTINCT Grp) AS h
FROM Groups
WHERE c = 0
GROUP BY a,b;

GO


DROP TABLE #Sample;
GO

答案 3 :(得分:0)

一种方法是使用运行总和设置组。然后计算每个a,b组合的组数。此后,您只需要group by

select a,b,1.0*grp_count/count(distinct grp) as result
from (select a,b,c,d,
      sum(case when c<>0 then 1 else 0 end) over(partition by a,b order by d) as grp,
      count(case when c=0 then 1 end) over(partition by a,b) as grp_count
      from tbl
     ) t
group by a,b,grp_count