我有一个大型数据库,我从中提取了一个研究人群。为了进行比较,我想选择具有类似特征的控制组。关于我想要匹配的两个标准是年龄和性别。查询为我提供了我想要匹配的数字
select sex, age/10 as decades,COUNT(*) as counts
from
(
select distinct m.patid
,m.sex,DATEPART(year,min(c.admitdate)) -m.yrdob as Age
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob
)x group by sex, Age/10
结果集看起来像
这个时代的十年专栏由表达式
给出(DATEPART(year,min(c.admitdate)) -m.yrdob)/10
这用于使用整数除法查找年龄范围为20-29,30-39等的人。例如,我想从一个更大的数据集中选择507名20多岁的女性。查找较大数据集特征的查询是
select distinct m.patid
,m.sex
,(DATEPART(year,min(c.admitdate)) -m.yrdob)/10 as decades
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob
编辑:第二次查询的结果
因此,我需要在第二个查询中将数十年的sum
列与第一个查询中的counts
相等。我尝试了(并返回零结果)如下。我需要做些什么来匹配这些年龄?
运行的查询,但不返回任何结果:
select x.PATID--,x.sex,x.decades,y.counts
from
(
select distinct m.patid
,m.sex
,(DATEPART(year,min(c.admitdate)) -m.yrdob)/10 as decades
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob
) as x
inner join
(
select sex, age/10 as decades,COUNT(*) as counts
from
(
select distinct m.patid
,m.sex,DATEPART(year,min(c.admitdate)) -m.yrdob as Age
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob
)x group by sex, Age/10
) as y on x.sex=y.sex and x.decades=y.decades
group by y.counts,x.PATID,x.sex,y.sex
having SUM(x.decades)=y.counts and x.sex=y.sex
答案 0 :(得分:1)
select
T1.sex,
T1.decades,
T1.counts,
T2.patid
from (
select
sex,
age/10 as decades,
COUNT(*) as counts
from (
select m.patid,
m.sex,
DATEPART(year,min(c.admitdate)) -m.yrdob as Age
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex,m.yrdob
)x
group by sex, Age/10
) as T1
join (
--right here is where the random sampling occurs
SELECT TOP 50--this is the total number of peolpe in our dataset
patid
,sex
,decades
from (
select m.patid,
m.sex,
(DATEPART(year,min(c.admitdate)) -m.yrdob)/10 as decades
from members as m
inner join claims as c on c.patid=m.PATID
group by m.PATID, m.sex, m.yrdob
) T2
order by NEWID()
) as T2
on T2.sex = T1.sex
and T2.decades = T1.decades
编辑:我发布了另一个与此相似的问题,其中我发现我的结果实际上并不是随机的,但它们只是前N个结果。我在最外层的查询中按newid()
排序,所有正在进行的操作都是在完全相同的结果集周围进行的。对于现已结束的问题,我发现我需要在上述查询的注释行中使用TOP
关键字和order by newid()
。