是否有一个sql查询来统计特定年份的人数,知道每个人的出生日期和死亡日期?

时间:2019-06-05 21:22:13

标签: sql proc-sql

我有一张表,显示人的姓名,出生日期和死亡日期(1900-2000年)。我需要知道在一定时期内每年的人数,例如1940年的人口为23亿,1941年为24亿,1942年为22亿,依此类推,直到1950年。

我在《 SAS企业指南》中工作,也许代码看起来与普通sql有所不同。至少我想看到这样的东西:

〜 人数|年

2.300.000.000 | 1940 2.400.000.000 | 1941 .....................

select
count(name),
from db
where bd<1jan1940 and dd>=1jan1940 and dd=<31dec1940
group by month

2 个答案:

答案 0 :(得分:0)

首先,您必须知道1899年底的初始人口。比方说,这是20亿。然后将每年的出生数减去死亡数相加。 (为此,您必须访问该表两次,一次用于出生,一次用于死亡。)使用SUM OVER获得运行总计。

我不确定您实际使用的是哪个DBMS,但这是非常标准的SQL:

select yr, 2000000000 + sum(births.cnt - deaths.cnt) over (order by yr)
from
(
  select extract(year from bd) as yr, count(*) as cnt 
  from db
  group by extract(year from bd)
) births
join
(
  select extract(year from dd) as yr, count(*) as cnt 
  from db
  group by extract(year from dd)
) deaths using (yr)
order by yr;

答案 1 :(得分:0)

data dob_data;
do i = 1 to 10000;
    num = ceil(rand('UNIFORM',0,10));   
    dob = intnx('day','01JAN1899'd,ceil(rand('UNIFORM',1,36865)));
    select (num);
        when (1)  dod = intnx('day',dob,ceil(rand('UNIFORM',1,36865)));
        otherwise dod = .;
    end;
    output;
end;
format dob dod date9.;
drop num;
run;


data calendar;
    do i=0 to 100;
        year = 1900+i;
        soy = intnx('year','01JAN1900'd,i,'s');
        eoy = intnx('year','01JAN1900'd,i,'e');
        output;
    end;
    format soy eoy date9.;
run;


proc sql;
    create table pop as
    select year,
    sum(case when DOB < soy and coalesce(DOD,'31DEC2200'd) ge soy then 1 else 0 end) as Alive_At_Start,
    sum(case when DOB between soy and eoy then 1 else 0 end) as Born_During,
    sum(case when coalesce(DOD,'31DEC2200'd) between soy and eoy then -1 else 0 end) as Passed,
    sum(case when DOB le eoy and coalesce(DOD,'31DEC2200'd) > eoy then 1 else 0 end) as Alive_At_End
    from dob_data t1, calendar t2
    group by year;
quit;