我想将两张表作为数据质量报告的一部分进行比较。结果应该是三列,总结表table_a和table_b中的列column_x:
第1列和第2列很容易设置:
select
sum(CASE WHEN column_x = '' THEN 0 ELSE 1 END) / count(*) AS percent_complete_in_a, -- column 1
count(DISTINCT column_x) AS distinct_values_A -- column 2
from table_A
但我无法弄清楚如何编写查询,以便第3列可以出现在相同的结果中。我已经尝试了以下几种变体,但每一种都会在Postgres中引发语法错误:
select
sum(CASE WHEN column_x = '' THEN 0 ELSE 1 END) / count(column_x) AS percent_complete_in_a, -- column 1
count(DISTINCT column_x) AS distinct_values_A, -- column 2
count(DISTINCT column_x where column_x not in (select DISTINCT column_x FROM table_b)) as distinct_values_A_except_B -- column 3
from table_a
有没有办法构建此查询以使其显示所有三列?
答案 0 :(得分:1)
我相信这会有助于使用左连接。注意为了避免改变计数,我使用了一个“select distinct”子查询,它不应该与table_a中的任何行相乘:
SELECT
SUM(CASE WHEN a.column_x = '' OR a.column_x IS NULL
THEN 0 ELSE 1 END) / (COUNT(*) * 1.0) AS percent_complete_in_a
, COUNT(DISTINCT a.column_x) AS distinct_values_a
, COUNT(DISTINCT case when b.column_x IS NULL then a.column_x end) AS distinct_values_A_except_B
FROM table_a a
LEFT JOIN (
SELECT DISTINCT column_x FROM table_b
) b ON a.column_x = b.column_x
;
变更:
IS NULL
* 1.0
,因此您可以获得百分比答案 1 :(得分:0)
我会用子查询做到这一点:
SELECT FinishingDate,OpeningDate FROM Sales
group by FinishingDate,OpeningDate
having DATEDIFF(m,OpeningDate,FinishingDate) >= 16