如何计算年份范围的平均值?

时间:2019-07-28 07:06:28

标签: sql sas

我有一个包含许多字段的数据集。我正在尝试通过年份范围的平均值来汇总“价格”数据。例如:

  • 1900年至1925年:“平均价格”
  • 1925年至1950年:“平均价格”
  • 1950年至1975年:“平均价格”
  • 1975年至2000年:“平均价格”
  • 2000年至2017年:“平均价格”

尝试:

proc sql;
select avg(price) as avg_price
FROM summary
WHEN year between 1995 and 2000;
quit;

上面的代码不起作用。您能帮我提供代码吗(请添加到proc并退出,或者我需要其他任何东西,我是SAS / SQL的新手)

enter image description here

4 个答案:

答案 0 :(得分:1)

如果您需要平均一年,则需要按年分组

select year, avg(price) as avg_price
FROM summary
WHERE  year between 1995 and 2000
group by year;

或自定义年份范围的一种简单方法是工会

   select  'from 1940 to 1960', avg(price)
   from summary
   WHERE  year between 1940 and 1960
   union 
   select  'from 1960 to 1980', avg(price)
   from summary
   WHERE  year between 1960 and 1980
   union 
   select  'from 1980 to 2000', avg(price)
   from summary
   WHERE  year between 1980 and 2000

答案 1 :(得分:0)

您收到的错误似乎表明可变年份是字符串而不是数字。以下转换应有帮助

  PROC SQL;
    SELECT mean(price) as average FROM have 
    WHERE 1995 <= input(year,8.) <= 2000 ;
  quit;

答案 2 :(得分:0)

在SQL中,您可以group by case语句或通过计算变量(也称为列)。 SQL中的平均函数为MEAN

按计算列分组的示例:

data have;
  do date = '01jan1900'd to '31dec2020'd;
    year = year(date);
    yearChar = put(year,4.);
    price = exp ((date - '01jan1940'd) / (365*12) );
    output;
  end;
  format date yymmdd10.;
run;

proc sql;
  create table want as 
  select
    case 
      when year between 1900 and 1924 then '1900 to 1924'
      when year between 1925 and 1949 then '1925 to 1949'
      when year between 1950 and 1974 then '1950 to 1974'
      when year between 1975 and 1999 then '1975 to 1999'
      when year between 2000 and 2017 then '2000 to 1917'
      else 'out of range'
    end
    as years
  , mean (price) as average_price
  from have
  group by years
  having years not in ('out of range')
;

将创建一个数据集,例如

years ($12)     average_price (double)
1900 to 1924       0.120
1925 to 1949       0.967
1950 to 1974       7.777
1975 to 1999      62.546
2000 to 1917     345.873

对于year变量是字符类型的情况,您需要将值转换为数字,并在涉及数字的between表达式中使用转换后的值。

示例:

YearChar是一个包含年份值的字符列。 input函数将字符串转换为数值(如果可能)。问号?会在转换失败时(例如年份为****UNKN时阻止日志消息)

proc sql;
  create table want as 
  select
    case 
      when input(yearChar,?4.) between 1900 and 1924 then '1900 to 1924'
      when input(yearChar,?4.) between 1925 and 1949 then '1925 to 1949'
      when input(yearChar,?4.) between 1950 and 1974 then '1950 to 1974'
      when input(yearChar,?4.) between 1975 and 1999 then '1975 to 1999'
      when input(yearChar,?4.) between 2000 and 2017 then '2000 to 1917'
      else 'out of range'
    end
    as years
  , mean (price) as average_price
  from have
  group by years
  having years not in ('out of range')
;

答案 3 :(得分:0)

我假设您的意思是1920-1924、1925-1930等,因此每年计算一次。

您可以使用group by和算术:

proc sql;
    select floor(year / 5) * 5 as from_year, 
           avg(price) as avg_price
    from summary
    group by floor(year / 5);   
quit;

如果您还想要结束年份:

proc sql;
    select floor(year / 5) * 5 as from_year, 
           floor(year / 5) * 5 + 4 as to_year, 
           avg(price) as avg_price
    from summary
    group by floor(year / 5);   
quit;