我每月都有数据,每天有几次观察。我有日,月和年的变量。如何仅保留每个月的第一天和最后一天的数据?我的数据只有工作日,因此每个月的第一天和最后五天每月都会发生变化,即2008年1月的前五天可能是本月的第2天,第3天,第4天,第7天和第8天。 以下是数据文件的示例。我不知道如何分享这个,所以我只是复制了以下几行。这是从2008年1月2日。 first.variable和last.variable的变化是否有效?我如何保留每月前5天和最后5天的观察结果? 感谢。
1 AA 500 B 36.9800 NH 2 1 2008 9:10:21
2 AA 500 S 36.4500 NN 2 1 2008 9:30:41
3 AA 100 B 36.4700 NH 2 1 2008 9:30:43
4 AA 100 B 36.4700 NH 2 1 2008 9:30:48
5 AA 50 S 36.4500 NN 2 1 2008 9:30:49
答案 0 :(得分:2)
如果要检查数据并确定最小值5和最大值5,则可以使用PROC SUMMARY
。然后,您可以将结果与数据合并以选择记录。
因此,如果您的数据包含YEAR,MONTH和DAY变量,则可以使用简单的步骤创建一个每月最多和最低五天的新数据集。
proc sort data=HAVE (keep=year month day) nodupkey
out=ALLDAYS;
by year month day;
run;
proc summary data=ALLDAYS nway;
class year month;
output out=MIDDLE
idgroup(min(day) out[5](day)=min_day)
idgroup(max(day) out[5](day)=max_day)
/ autoname ;
run;
proc transpose data=MIDDLE out=DAYS (rename=(col1=day));
by year month;
var min_day: max_day: ;
run;
proc sql ;
create table WANT as
select a.*
from HAVE a
inner join DAYS b
on a.year=b.year and a.month=b.month and a.day = b.day
;
quit;
答案 1 :(得分:1)
/****
get some dates to play with
****/
data dates(keep=i thisdate);
offset = input('01Jan2015',DATE9.);
do i=1 to 100;
thisdate = offset + round(599*ranuni(1)+1); *** within 600 days from offset;
output;
end;
format thisdate date9.;
run;
/****
BTW: intnx('month',thisdate,1)-1 = first day of next month. Deduct 1 to get the last day
of the current month.
intnx('month',thisdate,0,"BEGINNING") = first day of the current month
****/
proc sql;
create table first5_last5 AS
SELECT
*
FROM
dates /* replace with name of your data set */
WHERE
/* replace all occurences of 'thisdate' with name of your date variable */
( intnx('month',thisdate,1)-5 <= thisdate <= intnx('month',thisdate,1)-1 )
OR
( intnx('month',thisdate,0,"BEGINNING") <= thisdate <= intnx('month',thisdate,0,"BEGINNING")+4 )
ORDER BY
thisdate;
quit;
答案 2 :(得分:1)
创建一些具有所需结构的数据;
Data inData (drop=_:); * froget all variables starting with an underscore*;
format date yymmdd10. time time8.;
_instant = datetime();
do _i = 1 to 1E5;
date = datepart(_instant);
time = timepart(_instant);
yy = year(date);
mm = month(date);
dd = day(date);
*just some more random data*;
letter = byte(rank('a') +floor(rand('uniform', 0, 26)));
*select week days*;
if weekday(date) in (2,3,4,5,6) then output;
_instant = _instant + 1E5*rand('exponential');
end;
run;
计算每月的天数;
proc sql;
create view dayCounts as
select yy, mm, count(distinct dd) as _countInMonth
from inData
group by yy, mm;
quit;
选择日期;
data first_5(drop=_:) last_5(drop=_:);
merge inData dayCounts;
by yy mm;
_newDay = dif(date) ne 0;
retain _nrInMonth;
if first.mm then _nrInMonth = 1;
else if _newDay then _nrInMonth + 1;
if _nrInMonth le 5 then output first_5;
if _nrInMonth gt _countInMonth - 5 then output last_5;
run;
答案 3 :(得分:1)
使用INTNX()
功能。您可以使用INTNX('month',...)
查找该月的开始日期和结束日期,然后使用INTNX('weekday',...)
查找前5周工作日和最近5个工作日。
您可以使用MDY()
功能将月,日,年值转换为日期。让我们假设您这样做并创建一个名为TODAY
的变量。然后测试它是否在该月的最后5个工作日的前5个工作日内,您可以执行以下操作:
first5 = intnx('weekday',intnx('month',today,0,'B'),0) <= today
<= intnx('weekday',intnx('month',today,0,'B'),4) ;
last5 = intnx('weekday',intnx('month',today,0,'E'),-4) <= today
<= intnx('weekday',intnx('month',today,0,'E'),0) ;
请注意,这些范围将包括周末,但如果您的数据没有这些日期则不重要。 但如果您的数据跳过假期,则可能会出现问题。