我的数据格式如下: - 第一列:指示机器是否正在运行 - 第二列:机器运行的总时间
见下面的数据集:
structure(c("", "running", "running", "running", "", "", "",
"running", "running", "", "10", "15", "30", "2", "5", "17", "47",
"12", "57", "87"), .Dim = c(10L, 2L), .Dimnames = list(NULL,
c("c", "v")))
我想添加第三列,它给出机器运行的总时间(通过添加自机器开始运行以来的所有时间)。请参阅下面的所需输出:
[1,] "" "10" "0"
[2,] "running" "15" "15"
[3,] "running" "30" "45"
[4,] "running" "2" "47"
[5,] "" "5" "0"
[6,] "" "17" "0"
[7,] "" "47" "0"
[8,] "running" "12" "12"
[9,] "running" "57" "69"
[10,] "" "87" "0"
我尝试在R中编写一些代码以优雅的方式来实现这一点,但我的编程技巧目前来说太有限了。有没有人知道这个问题的解决方案?先谢谢你了!
答案 0 :(得分:2)
首先,我们将您的数据转换为可以包含混合数据类型的更合适的数据结构:
m <- structure(c("", "running", "running", "running", "", "", "",
"running", "running", "", "10", "15", "30", "2", "5", "17", "47",
"12", "57", "87"), .Dim = c(10L, 2L), .Dimnames = list(NULL,
c("c", "v")))
DF <- as.data.frame(m, stringsAsFactors = FALSE)
DF[] <- lapply(DF, type.convert, as.is = TRUE)
然后我们可以使用package data.table:
轻松完成library(data.table)
setDT(DF)
DF[, total := cumsum(v), by = rleid(c)]
DF[c == "", total := 0]
# c v total
# 1: 10 0
# 2: running 15 15
# 3: running 30 45
# 4: running 2 47
# 5: 5 0
# 6: 17 0
# 7: 47 0
# 8: running 12 12
# 9: running 57 69
#10: 87 0
答案 1 :(得分:2)
这是一个使用基数R的简单解决方案:
DF$total <- ave(DF$v, DF$c, cumsum(DF$c == ""), FUN = cumsum)
DF$total[DF$c == ""] <- 0
> DF
c v total
1 10 0
2 running 15 15
3 running 30 45
4 running 2 47
5 5 0
6 17 0
7 47 0
8 running 12 12
9 running 57 69
10 87 0
答案 2 :(得分:1)
我们可以使用dplyr
library(dplyr)
DF %>%
group_by(cumsum(c==''),c) %>%
mutate(total=replace(cumsum(v), c=='', 0) )