我在R中有一个示例数据框,如下所示:
dat <- data.frame(NAME=c("SAMPLE1", "SAMPLE1", "SAMPLE1", "SAMPLE1", "SAMPLE2","SAMPLE2","SAMPLE2","SAMPLE2"),
ID=c(33,33,33,33,253,253,253,253),
SURVEY_YEAR=c(1959,1960,1961,1965,2002,2007,2010,2014),
REFERENCE_YEAR=c(1959,1959,1960,1963,2002, 2004,2009,2011),
VALUE=c(0,-6,-10,-23,0,-9,NA,-40))
dat
NAME ID SURVEY_YEAR REFERENCE_YEAR VALUE
1 SAMPLE1 33 1959 1959 0
2 SAMPLE1 33 1960 1959 -6
3 SAMPLE1 33 1961 1960 -10
4 SAMPLE1 33 1965 1963 -23
5 SAMPLE2 253 2002 2002 0
6 SAMPLE2 253 2007 2004 -9
7 SAMPLE2 253 2010 2009 NA
8 SAMPLE2 253 2014 2011 -40
我要做的是将REFERENCE_YEAR和SURVEY_YEAR扩展并转换为YEAR一列,以便生成的数据框如下所示:
NAME ID YEAR VALUE
SAMPLE1 33 1959 0 # VALUE from REFERENCE_YEAR 1959
SAMPLE1 33 1959 0 # VALUE from SURVEY_YEAR 1959
--------------------------------------------------------------------------------
SAMPLE1 33 1959 0 # for REFERENCE_YEAR 1959, take previous VALUE
SAMPLE1 33 1960 -6 # VALUE from SURVEY_YEAR 1960
--------------------------------------------------------------------------------
SAMPLE1 33 1960 -6 # for REFERENCE_YEAR 1960, take previous VALUE
SAMPLE1 33 1961 -10 # VALUE from SURVEY_YEAR 1961
--------------------------------------------------------------------------------
SAMPLE1 33 1963 -10 # for REFERENCE_YEAR 1963, take previous VALUE (-10)
SAMPLE1 33 1965 -23 # VALUE from SURVEY_YEAR 1965
--------------------------------------------------------------------------------
SAMPLE2 253 2002 0 # VALUE from REFERENCE_YEAR 2002
SAMPLE2 253 2002 0 # VALUE from SURVEY_YEAR 2002
--------------------------------------------------------------------------------
SAMPLE2 253 2004 0 # for REFERENCE_YEAR 2004, take previous VALUE (0)
SAMPLE2 253 2007 -9 # VALUE taken from SURVEY_YEAR 2007
--------------------------------------------------------------------------------
SAMPLE2 253 2009 NA # if one value is NA in a period (in this case 2009 to 2010), the whole period should be set to NA
SAMPLE2 253 2010 NA
--------------------------------------------------------------------------------
SAMPLE2 253 2011 -9 # for REFERENCE_YEAR 2011, take previous numerical VALUE (not NA,but -9)
SAMPLE2 253 2014 -40 # VALUE taken from SURVEY_YEAR 2014
有一种简单的方法吗?
编辑: 我希望数据属于上述结构,因为我想像这样绘图(也许这对图表更容易理解?)。这里添加了NA值,其中系列是不连续的(SAMPLE 1中的1962和SAMPLE2中的2003和2008)。这就是为什么应该像上面的结果窗口一样维护结构。
答案 0 :(得分:1)
从根本上说,您的问题是使用规则将值分配给年份。我不清楚这些规则是什么,但作为一个开始你可以做这样的事情:
dat <- data.frame(NAME=c("SAMPLE1", "SAMPLE1", "SAMPLE1", "SAMPLE1", "SAMPLE2","SAMPLE2","SAMPLE2","SAMPLE2"),
ID=c(33,33,33,33,253,253,253,253),
SURVEY_YEAR=c(1959,1960,1961,1965,2002,2007,2010,2014),
REFERENCE_YEAR=c(1959,1959,1960,1963,2002, 2004,2009,2011),
VALUE=c(0,-6,-10,-23,0,-9,NA,-40))
uyear=data.frame(UYEAR=unique(c(dat$SURVEY_YEAR,dat$REFERENCE_YEAR)),val=NA)
uyear<-uyear[with(uyear,order(UYEAR)),]
for(i in 1:nrow(uyear)) {
if(uyear$UYEAR[i] %in% dat$SURVEY_YEAR) {
uyear$val[i]=dat$VALUE[which(dat$SURVEY_YEAR==uyear$UYEAR[i])[1]]
}else {uyear$val[i]=dat$VALUE[which(dat$REFERENCE_YEAR==uyear$UYEAR[i])[1]-1]}
}
那就是说,让“YEAR”意味着两个不同的东西(开始和结束)而不保留一个解释哪个是哪个的列是一个坏主意。