R:添加新列并填写下面的列中的值

时间:2014-08-04 15:16:19

标签: r dataframe row

最终产品的新问题/部分已浮出水面

我的输入文件如下所示:

NAME;       YEAR;   ID;     VALUE   
Sample1;    1998;   354;    45
Sample1;    1999;   354;    23
Sample1;    2000;   354;    66
Sample1;    2001;   354;    98
Sample1;    2002;   354;    36
Sample1;    2003;   354;    59
Sample1;    2004;   354;    64
Sample1;    2005;   354;    23
Sample1;    2006;   354;    69
Sample1;    2007;   354;    94
Sample1;    2008;   354;    24
Sample2;    1964;   1342;    7
Sample2;    1965;   1342;   24
Sample3;    2002;   859;    90
Sample3;    2003;   859;    93
Sample3;    2004;   859;    53
Sample3;    2005;   859;    98 

我想要做的是在每个样本组的顶部添加一行(例如样本1的一行,样本2的一行等),其中包含来自初始行的所有相同值,除了对于VALUE字段,它应包含值0,对于YEAR字段,它应该是前一年。

我最终输出的大约80000个样本应该如下所示:

NAME;       YEAR;   ID;     VALUE
Sample1;    1997;   354;     0
Sample1;    1998;   354;    45
Sample1;    1999;   354;    23
Sample1;    2000;   354;    66
Sample1;    2001;   354;    98
Sample1;    2002;   354;    36
Sample1;    2003;   354;    59
Sample1;    2004;   354;    64
Sample1;    2005;   354;    23
Sample1;    2006;   354;    69
Sample1;    2007;   354;    94
Sample1;    2008;   354;    24
Sample2;    1963;   354;     0
Sample2;    1964;   1342;    7
Sample2;    1965;   1342;   24
Sample3;    2001;   1342;   0
Sample3;    2002;   859;    90
Sample3;    2003;   859;    93
Sample3;    2004;   859;    53
Sample3;    2005;   859;    98 

感谢您的帮助!

3 个答案:

答案 0 :(得分:2)

假设您的data.frame为df,我会在基数R中执行此操作:

df <- do.call(rbind, lapply(split(df, df$NAME), function(x) {
       x <- rbind(x[1,], x); x[1,"VALUE"] <- 0; x[1, "YEAR"] <- x[1, "YEAR"] -1; 
       return(x)}))

如果需要,可以将行名称更改回正常编号

rownames(df) <- seq_len(nrow(df))
df
#      NAME YEAR   ID VALUE
#1  Sample1 1997  354     0
#2  Sample1 1998  354    45
#3  Sample1 1999  354    23
#4  Sample1 2000  354    66
#5  Sample1 2001  354    98
#6  Sample1 2002  354    36
#7  Sample1 2003  354    59
#8  Sample1 2004  354    64
#9  Sample1 2005  354    23
#10 Sample1 2006  354    69
#11 Sample1 2007  354    94
#12 Sample1 2008  354    24
#13 Sample2 1963 1342     0
#14 Sample2 1964 1342     7
#15 Sample2 1965 1342    24
#16 Sample3 2001  859     0
#17 Sample3 2002  859    90
#18 Sample3 2003  859    93
#19 Sample3 2004  859    53
#20 Sample3 2005  859    98

在上面的代码中简单组合在一起的步骤说明:

# split by sample
lst <- split(df, df$NAME)
# add the first row to each sample
lst <- lapply(lst, function(x) rbind(x[1,], x))
# change the YEAR and VALUE entries in each first row
lst <- lapply(lst, function(x) {x[1,"VALUE"] <- 0; x[1, "YEAR"] <- x[1, "YEAR"] -1; return(x)})
# rbind back to a data frame
df <- do.call(rbind, lst)

答案 1 :(得分:1)

读入您的数据:

d <- read.table(text = "NAME;       YEAR;   ID;     VALUE   
Sample1;    1998;   354;    45
Sample1;    1999;   354;    23
Sample1;    2000;   354;    66
Sample1;    2001;   354;    98
Sample1;    2002;   354;    36
Sample1;    2003;   354;    59
Sample1;    2004;   354;    64
Sample1;    2005;   354;    23
Sample1;    2006;   354;    69
Sample1;    2007;   354;    94
Sample1;    2008;   354;    24
Sample2;    1964;   1342;    7
Sample2;    1965;   1342;   24
Sample3;    2002;   859;    90
Sample3;    2003;   859;    93
Sample3;    2004;   859;    53
Sample3;    2005;   859;    98 ", header = TRUE, sep = ";", stringsAsFactors = FALSE)

无论出于何种原因,我觉得这样做是为了一个循环:

tmp <- as.factor(d$NAME)
d2 <- setNames(data.frame(matrix(nrow=(nrow(d)+nlevels(tmp)), ncol=ncol(d))),
               names(d))
s <- split(d, d$NAME)
j <- 1
for(i in 1:nlevels(tmp)) {
    d2[j,] <- c(s[[i]][1,1], s[[i]][1,2]-1, s[[i]][1,3], 0)
    d2[(j+1):(j + nrow(s[[i]])), ] <- s[[i]]
    j <- j + nrow(s[[i]]) + 1
}

结果:

          NAME YEAR   ID VALUE
1      Sample1 1997  354     0
2      Sample1 1998  354    45
3      Sample1 1999  354    23
4      Sample1 2000  354    66
5      Sample1 2001  354    98
6      Sample1 2002  354    36
7      Sample1 2003  354    59
8      Sample1 2004  354    64
9      Sample1 2005  354    23
10     Sample1 2006  354    69
11     Sample1 2007  354    94
12     Sample1 2008  354    24
13     Sample2 1963 1342     0
14     Sample2 1964 1342     7
15     Sample2 1965 1342    24
16     Sample3 2001  859     0
17     Sample3 2002  859    90
18     Sample3 2003  859    93
19     Sample3 2004  859    53
20     Sample3 2005  859    98

答案 2 :(得分:1)

您可以尝试data.table更大的数据集:

  library(data.table)
  DT <- data.table(dat)
  sub <- unique(DT, by="NAME")[, c("YEAR",  "VALUE") := list(YEAR-1, 0)]
  rbindlist(list(DT, sub))[order(NAME, YEAR)]
  #       NAME YEAR   ID VALUE
  # 1: Sample1 1997  354     0
  # 2: Sample1 1998  354    45
  # 3: Sample1 1999  354    23
  # 4: Sample1 2000  354    66
  # 5: Sample1 2001  354    98
  # 6: Sample1 2002  354    36
  # 7: Sample1 2003  354    59
  # 8: Sample1 2004  354    64
  # 9: Sample1 2005  354    23
  #10: Sample1 2006  354    69
  #11: Sample1 2007  354    94
  #12: Sample1 2008  354    24
  #13: Sample2 1963 1342     0
  #14: Sample2 1964 1342     7
  #15: Sample2 1965 1342    24
  #16: Sample3 2001  859     0
  #17: Sample3 2002  859    90
  #18: Sample3 2003  859    93
  #19: Sample3 2004  859    53
  #20: Sample3 2005  859    98

正如@Arun所建议的那样,更紧凑的代码将是

  DT[, list(YEAR=c(YEAR[1L]-1L, YEAR), VALUE=c(0,VALUE)), by=list(NAME,ID)]