我正在加入两个表格,我想得到一些描述该操作的简单统计数据,即:
我确信必须有一些聪明而简单的方法来即时获取所有信息 - 无需创建和计算额外的标志/变量,也无需长时间查询。
如果 3个表,那么在加入两个以上的数据集时可以使用这样的解决方案也很棒 - 并且可以轻松计算下面图像中的 7组记录比较。
我目前正在从SAS查询,但我经常遇到与Oracle和MS SQL一样的问题 - 我正在寻找一些与dbms无关的解决方案。
答案 0 :(得分:1)
您可以通过以下方式获取表间密钥的分配:
select in1, in2, in3, count(*) as numkeys, min(key) as key1, max(key) as key2
from (select key, sum(in1) as in1, sum(in2) as in2, sum(in3) as in3
from ( (select key, count(*) as in1, 0 as in2, 0 as in3 from table1 group by key
) union all
(select key, 0 as in1, count(*) as in2, 0 as in3 from table2 group by key
) union all
(select key, 0 as in1, 0 as in2, count(*) as in3 from table3 group by key
)
) t123
group by key
) k
group by in1, in2, in3;
所有值都是非零的行是内连接返回的值。如果"键"是主键,然后所有值都是0或1(除了计数)。
答案 1 :(得分:1)
在SAS PROC SQL中工作时,最有用的选项之一是verbose
。例如:
%macro create_data(num, min, max);
/* Creates sample datasets */
DATA have&num.;
do i=&min. to &max.;
ID = i; OUTPUT;
end; DROP i;
RUN;
%mend create_data;
%create_data(1,1,10);
%create_data(2,5,15);
%create_data(3,9,100);
PROC SQL verbose;
/* Sample join */
CREATE table want as
SELECT have1.ID as ID1, have2.ID as ID2, have3.ID as ID3
FROM have1
FULL JOIN have2
on have1.ID=have2.ID
RIGHT JOIN have3
on have3.ID=have1.ID;
QUIT;
将在SAS日志中打印以下内容:
Data Set WORK.HAVE1 is num=1 and tag=0001. NOBS=10, lrecl=8.
Data Set WORK.HAVE2 is num=2 and tag=0002. NOBS=11, lrecl=8.
Data Set WORK.HAVE3 is num=3 and tag=0004. NOBS=92, lrecl=8.
NOTE: Table WORK.WANT created, with 92 rows and 3 columns.
这对于查看连接的行为非常有用,尤其是对于多个数据集。
如果您在SAS之外工作并寻找通用SQL解决方案,我认为您最好(也是最快)的解决方案是查询计数。例如,从上面的选择:
* how many rows I have in total in each of tables ;
SELECT count(*) from have1;
SELECT count(*) from have2;
SELECT count(*) from have3;
* how many of them are joined ;
SELECT count(*) from want;
* how many will be left unjoined ;
SELECT count(*) from want where missing(ID1) or missing(ID2) or missing(ID3);
如果不了解具体细节,很难给你额外的建议。 SQL没有标准解决方案的部分原因是不同类型的连接本质上彼此差异很大。