我有一个包含许多字段的数据集。我正在尝试通过年份范围的平均值来汇总“价格”数据。例如:
尝试:
proc sql;
select avg(price) as avg_price
FROM summary
WHEN year between 1995 and 2000;
quit;
上面的代码不起作用。您能帮我提供代码吗(请添加到proc并退出,或者我需要其他任何东西,我是SAS / SQL的新手)
答案 0 :(得分:1)
如果您需要平均一年,则需要按年分组
select year, avg(price) as avg_price
FROM summary
WHERE year between 1995 and 2000
group by year;
或自定义年份范围的一种简单方法是工会
select 'from 1940 to 1960', avg(price)
from summary
WHERE year between 1940 and 1960
union
select 'from 1960 to 1980', avg(price)
from summary
WHERE year between 1960 and 1980
union
select 'from 1980 to 2000', avg(price)
from summary
WHERE year between 1980 and 2000
答案 1 :(得分:0)
您收到的错误似乎表明可变年份是字符串而不是数字。以下转换应有帮助
PROC SQL;
SELECT mean(price) as average FROM have
WHERE 1995 <= input(year,8.) <= 2000 ;
quit;
答案 2 :(得分:0)
在SQL中,您可以group by
case
语句或通过计算变量(也称为列)。 SQL中的平均函数为MEAN
按计算列分组的示例:
data have;
do date = '01jan1900'd to '31dec2020'd;
year = year(date);
yearChar = put(year,4.);
price = exp ((date - '01jan1940'd) / (365*12) );
output;
end;
format date yymmdd10.;
run;
proc sql;
create table want as
select
case
when year between 1900 and 1924 then '1900 to 1924'
when year between 1925 and 1949 then '1925 to 1949'
when year between 1950 and 1974 then '1950 to 1974'
when year between 1975 and 1999 then '1975 to 1999'
when year between 2000 and 2017 then '2000 to 1917'
else 'out of range'
end
as years
, mean (price) as average_price
from have
group by years
having years not in ('out of range')
;
将创建一个数据集,例如
years ($12) average_price (double)
1900 to 1924 0.120
1925 to 1949 0.967
1950 to 1974 7.777
1975 to 1999 62.546
2000 to 1917 345.873
对于year
变量是字符类型的情况,您需要将值转换为数字,并在涉及数字的between
表达式中使用转换后的值。
示例:
YearChar
是一个包含年份值的字符列。 input
函数将字符串转换为数值(如果可能)。问号?
会在转换失败时(例如年份为****
或UNKN
时阻止日志消息)
proc sql;
create table want as
select
case
when input(yearChar,?4.) between 1900 and 1924 then '1900 to 1924'
when input(yearChar,?4.) between 1925 and 1949 then '1925 to 1949'
when input(yearChar,?4.) between 1950 and 1974 then '1950 to 1974'
when input(yearChar,?4.) between 1975 and 1999 then '1975 to 1999'
when input(yearChar,?4.) between 2000 and 2017 then '2000 to 1917'
else 'out of range'
end
as years
, mean (price) as average_price
from have
group by years
having years not in ('out of range')
;
答案 3 :(得分:0)
我假设您的意思是1920-1924、1925-1930等,因此每年计算一次。
您可以使用group by
和算术:
proc sql;
select floor(year / 5) * 5 as from_year,
avg(price) as avg_price
from summary
group by floor(year / 5);
quit;
如果您还想要结束年份:
proc sql;
select floor(year / 5) * 5 as from_year,
floor(year / 5) * 5 + 4 as to_year,
avg(price) as avg_price
from summary
group by floor(year / 5);
quit;