R:在R中转换和扩展数据帧

时间:2014-08-20 13:49:44

标签: r dataframe

我在R中有一个示例数据框,如下所示:

dat <- data.frame(NAME=c("SAMPLE1", "SAMPLE1", "SAMPLE1", "SAMPLE1", "SAMPLE2","SAMPLE2","SAMPLE2","SAMPLE2"),
                  ID=c(33,33,33,33,253,253,253,253),
                  SURVEY_YEAR=c(1959,1960,1961,1965,2002,2007,2010,2014), 
                  REFERENCE_YEAR=c(1959,1959,1960,1963,2002, 2004,2009,2011),
                  VALUE=c(0,-6,-10,-23,0,-9,NA,-40))

dat

  NAME     ID SURVEY_YEAR REFERENCE_YEAR VALUE
1 SAMPLE1  33        1959           1959     0
2 SAMPLE1  33        1960           1959    -6
3 SAMPLE1  33        1961           1960   -10
4 SAMPLE1  33        1965           1963   -23
5 SAMPLE2 253        2002           2002     0
6 SAMPLE2 253        2007           2004    -9
7 SAMPLE2 253        2010           2009    NA
8 SAMPLE2 253        2014           2011   -40

我要做的是将REFERENCE_YEAR和SURVEY_YEAR扩展并转换为YEAR一列,以便生成的数据框如下所示:

NAME    ID  YEAR    VALUE
SAMPLE1 33  1959    0         # VALUE from REFERENCE_YEAR 1959
SAMPLE1 33  1959    0         # VALUE from SURVEY_YEAR 1959
--------------------------------------------------------------------------------
SAMPLE1 33  1959    0         # for REFERENCE_YEAR 1959, take previous VALUE
SAMPLE1 33  1960    -6        # VALUE from SURVEY_YEAR 1960
--------------------------------------------------------------------------------
SAMPLE1 33  1960    -6        # for REFERENCE_YEAR 1960, take previous VALUE
SAMPLE1 33  1961    -10       # VALUE from SURVEY_YEAR 1961
--------------------------------------------------------------------------------
SAMPLE1 33  1963    -10       # for REFERENCE_YEAR 1963, take previous VALUE (-10)
SAMPLE1 33  1965    -23       # VALUE from SURVEY_YEAR 1965
--------------------------------------------------------------------------------
SAMPLE2 253 2002    0         # VALUE from REFERENCE_YEAR 2002
SAMPLE2 253 2002    0         # VALUE from SURVEY_YEAR 2002
--------------------------------------------------------------------------------
SAMPLE2 253 2004    0         # for REFERENCE_YEAR 2004, take previous VALUE (0)
SAMPLE2 253 2007    -9        # VALUE taken from SURVEY_YEAR 2007
--------------------------------------------------------------------------------
SAMPLE2 253 2009    NA       # if one value is NA in a period (in this case 2009 to 2010), the whole period should be set to NA
SAMPLE2 253 2010    NA
--------------------------------------------------------------------------------
SAMPLE2 253 2011    -9       # for REFERENCE_YEAR 2011, take previous numerical VALUE (not NA,but -9)
SAMPLE2 253 2014    -40      # VALUE taken from SURVEY_YEAR 2014

有一种简单的方法吗?

编辑: 我希望数据属于上述结构,因为我想像这样绘图(也许这对图表更容易理解?)。这里添加了NA值,其中系列是不连续的(SAMPLE 1中的1962和SAMPLE2中的2003和2008)。这就是为什么应该像上面的结果窗口一样维护结构。

enter image description here enter image description here

1 个答案:

答案 0 :(得分:1)

从根本上说,您的问题是使用规则将值分配给年份。我不清楚这些规则是什么,但作为一个开始你可以做这样的事情:

dat <- data.frame(NAME=c("SAMPLE1", "SAMPLE1", "SAMPLE1", "SAMPLE1", "SAMPLE2","SAMPLE2","SAMPLE2","SAMPLE2"),
              ID=c(33,33,33,33,253,253,253,253),
              SURVEY_YEAR=c(1959,1960,1961,1965,2002,2007,2010,2014), 
              REFERENCE_YEAR=c(1959,1959,1960,1963,2002, 2004,2009,2011),
              VALUE=c(0,-6,-10,-23,0,-9,NA,-40))

uyear=data.frame(UYEAR=unique(c(dat$SURVEY_YEAR,dat$REFERENCE_YEAR)),val=NA)
uyear<-uyear[with(uyear,order(UYEAR)),]

for(i in 1:nrow(uyear)) {
  if(uyear$UYEAR[i] %in% dat$SURVEY_YEAR) {
    uyear$val[i]=dat$VALUE[which(dat$SURVEY_YEAR==uyear$UYEAR[i])[1]]
  }else {uyear$val[i]=dat$VALUE[which(dat$REFERENCE_YEAR==uyear$UYEAR[i])[1]-1]}
}

那就是说,让“YEAR”意味着两个不同的东西(开始和结束)而不保留一个解释哪个是哪个的列是一个坏主意。