Question

我有一个data.frame，每个唯一ID有多个条目。我需要确定哪些行超过了60秒的预定义时间限制。我已经附加了一个填充了术语“toolong”的列，以指示我需要分割时间列的行。然后，我想在具有“toolong”的行的正下方创建一个新行，并保留与“父行”相同的所有信息，除了将操作列更改为“l”，将时间列更改为上一次 - 60。父行将包含所有相同的信息，但操作列将更改为“for”，时间将更改为60秒。原始数据库中总共有32列，因此除了操作和时间之外，保留行的所有内容是必要的。

示例：

id <- c(1,1,1,1,2,2,2,2)
resting <- c("f","f","toolong","f","f","f","toolong","f")
action <- c("h","h","l","d","h","h","l","d")
time <- c(90,12,120,14,90,12,110,14)
other <- c(1,2,3,4,5,6,5,4)
dat <- data.frame(cbind(id,resting,action,time,other))

我希望它看起来如何：

   id2 resting2 action2 time2 other2
1    1        f       h    90      1
2    1        f       h    12      2
3    1  toolong     for    60      3
4    1  toolong       l    60      3
5    1        f       d    14      4
6    2        f       h    90      5
7    1        f       h    12      6
8    2  toolong     for    60      5
9    2  toolong       l    50      5
10   2        f       d    14      4

谢谢，蒂姆

Answer 1

首先，重复工具行......

R>rowID <- rep(1:8, times=as.factor(resting))        
R>dat2 <- dat[rowID,]
R>dat2
    id resting action time other
1    1       f      h   90     1
2    1       f      h   12     2
3    1 toolong      l  120     3
3.1  1 toolong      l  120     3
4    1       f      d   14     4
5    2       f      h   90     5
6    2       f      h   12     6
7    2 toolong      l  110     5
7.1  2 toolong      l  110     5
8    2       f      d   14     4

然后，对于重复的那些，每个先前记录减去60分钟......

R>dups <- unlist(tapply(duplicated(rowID), rowID,cumsum))
R>dat2$time <- dat2$time - 60*dups
R>dat2[dat2$resting == "toolong", "time"] <- pmin(60, dat2[dat2$resting == "toolong",     "time"] )
R>dat2
    id resting action time other
1    1       f      h   90     1
2    1       f      h   12     2
3    1 toolong      l   60     3
3.1  1 toolong      l   60     3
4    1       f      d   14     4
5    2       f      h   90     5
6    2       f      h   12     6
7    2 toolong      l   60     5
7.1  2 toolong      l   50     5
8    2       f      d   14     4

Answer 2

dat2 <- rbind(dat, dat[ dat$resting=="toolong" , ])
dat2 <- dat2[order(rownames(dat2)), ]
dat2[duplicated(dat2), "action"] <- "l"
names(dat2) <- paste0(names(dat2), "2")
dat2
#-------
   id2 resting2 action2 time2 other2
1    1        f       h    90      1
2    1        f       h    12      2
3    1  toolong       l   120      3
31   1  toolong       l   120      3
4    1        f       d    14      4
5    2        f       h    90      5
6    2        f       h    12      6
7    2  toolong       l   110      5
71   2  toolong       l   110      5
8    2        f       d    14      4

构造用作选择向量的重复rownames的另一种方法是使用mapply并将1添加到逻辑向量。这可能有一些优势，因为rownames中的句号是更好的“欺骗”指标。

 dat[ unlist(mapply( rep, rownames(dat), 1+(dat$resting=="toolong"))) , ]

    id resting action time other
1    1       f      h   90     1
2    1       f      h   12     2
3    1 toolong      l  120     3
3.1  1 toolong      l  120     3
4    1       f      d   14     4
5    2       f      h   90     5
6    2       f      h   12     6
7    2 toolong      l  110     5
7.1  2 toolong      l  110     5
8    2       f      d   14     4

解决评论::修改问题：

dat2$action2 <- as.character(dat2$action2)
dat2[ dat2$resting2=="toolong" & !duplicated(dat2) , "action2"] <- "for"
dat2
   id2 resting2 action2 time2 other2
1    1        f       h    90      1
2    1        f       h    12      2
3    1  toolong     for   120      3
31   1  toolong       l   120      3
4    1        f       d    14      4
5    2        f       h    90      5
6    2        f       h    12      6
7    2  toolong     for   110      5
71   2  toolong       l   110      5
8    2        f       d    14      4

根据因子变量在R中添加新行

2 个答案: