我有一个常规表[customer_table],其中包含一些看起来像这样的空值
id | customer | country | col0 | col1 | col2 |
==============================================
1 | foo | USA | NULL | foo | bar |
2 | bar | USA | foo | NULL | foo |
3 | foo2 | CANADA | bar | col1 | NULL |
4 | bar2 | GERMANY | foo | NULL | bar |
5 | bar3 | CANADA | foo | foo | bar |
6 | bar4 | UK | bar | foo | bar |
7 | bar5 | UK | bar | bar | bar |
我想计算按国家/地区分组的各列的非空值的百分比
country | col0% | col1% | col2% |
==================================
USA | 50% | 50% | 100% |
GERMANY | 100% | 0% | 100% |
CANADA | 100% | 100% | 50% |
UK | 100% | 100% | 100% |
这就是我想要的
select TOTAL.[country],
[count_col0]*100/[count_total] as [col0%],
[count_col1]*100/[count_total] as [col1%]
from (
(select [country], COUNT(*) as [count_total] from [customer_table]
where [country] <> '' group by [country]) TOTAL
left join
(select [country], COUNT(*) as [count_col0] from [customer_table]
where [country] <> '' and [col0] <> '' group by [country]) T_COL0
on T_COL0.[country] = TOTAL.[country]
left join
(select [country], COUNT(*) as [count_col1] from [customer_table]
where [country] <> '' and [col1] <> '' group by [country]) T_COL1
on T_COL1.[country] = TOTAL.[country]
)
它可以工作,但是我有很多专栏文章,我认为这不是一个很好的解决方案
答案 0 :(得分:2)
只使用聚合。最简单的方法是:
select country,
count(col1) * 1.0 / count(*),
count(col2) * 1.0 / count(*),
count(col3) * 1.0 / count(*)
from customertable
group by country
答案 1 :(得分:0)
DECLARE @customertable TABLE (country NVARCHAR(100), col1 BIGINT, col2 BIGINT, col3 BIGINT)
INSERT INTO @customertable
(country, col1, col2, col3)
VALUES
(N'USA', 0, null, 0)
,(N'USA', 0, null, 0)
,(N'USA', null, null, 0)
,(N'USA', 0, 0, null)
, (N'CA', 0, null, 0)
,(N'CA', 0, null, 0)
,(N'CA', null, null, 0)
,(N'CA', 0, 0, null)
;WITH DistinctCountries AS (
SELECT DISTINCT Country
FROM @customertable
)
SELECT Country
, col1/(total*1.0) as [col1pct]
, col2/(total*1.0) as [col2pct]
, col3/(total*1.0) as [col3pct]
FROM DistinctCountries AS DistinctCountries
OUTER APPLY (
SELECT
SUM(CASE WHEN col1 IS NULL THEN 0 ELSE 1 END) col1
,SUM(CASE WHEN col2 IS NULL THEN 0 ELSE 1 END) col2
,SUM(CASE WHEN col3 IS NULL THEN 0 ELSE 1 END) col3
,COUNT(1) as Total
FROM @customertable AS CountApply
WHERE CountApply.Country = DistinctCountries.Country
)MainCount
如果您有一个独特的国家/地区列表,则最好这样做。
如果您有很多列,最好创建一个动态SQL查询以自动创建每个CASE并将其标记化..或..一个动态数据透视查询。
答案 2 :(得分:0)
您要在此处查找COUNT(DISTINCT xxx)/ COUNT(*)模式。
现在,当您有很多要覆盖的列时,可以在INFORMATION_SCHEMA.COLUMNS系统表中查找它们并生成您要像这样运行的查询:
SELECT
'SELECT country'
UNION ALL
SELECT
CONCAT(', (100 * COUNT(DISTINCT ', COLUMN_NAME, ')) / COUNT(*) AS [', COLUMN_NAME, '%]')
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'customer_table'
AND TABLE_SCHEMA = 'dbo'
AND COLUMN_NAME NOT IN ('id', 'customer', 'country')
UNION ALL
SELECT
'FROM dbo.customer_table GROUP BY country;'
这将导致:
SELECT country
, (100 * COUNT(DISTINCT col0)) / COUNT(*) AS [col0%]
, (100 * COUNT(DISTINCT col1)) / COUNT(*) AS [col1%]
, (100 * COUNT(DISTINCT col2)) / COUNT(*) AS [col2%]
FROM dbo.customer_table GROUP BY country;