我有一个data.frame
,我需要向其中添加行,但是要添加的行数(及其内容)由data.frame
的现有行确定。我还想最后列出一列,列出每个重复组的行。这是示例数据:
> A <- data.frame(veh = c("MINIVAN","HEAVY TRUCK"),age = c(2.5,3.5),rows_to_add = c(2,3))
> A
veh age rows_to_add
1 MINIVAN 2.5 2
2 HEAVY TRUCK 3.5 3
和所需的输出:
> B <- rbind(do.call("rbind",replicate(n=unique(A[1,"rows_to_add"])+1,A[1,],simplify = FALSE)),
+ do.call("rbind",replicate(n=unique(A[2,"rows_to_add"])+1,A[2,],simplify = FALSE)))
> B <- cbind(B,enum = c(0:2,0:3))
> B
veh age rows_to_add enum
1 MINIVAN 2.5 2 0
2 MINIVAN 2.5 2 1
3 MINIVAN 2.5 2 2
24 HEAVY TRUCK 3.5 3 0
21 HEAVY TRUCK 3.5 3 1
22 HEAVY TRUCK 3.5 3 2
23 HEAVY TRUCK 3.5 3 3
很显然,我在这里用来生成输出的代码是混乱的,不可伸缩的,并且可能效率低下。我正在寻找一种通用的解决方案,该解决方案允许我以合理的速度使用较大的data.frame
来做到这一点,并避免出现循环(尝试加快加载循环的代码是此问题的推动力)。
This question处理的问题较弱,其中添加的行数不随数据本身的行而变化,并且要插入的行可以包含NA
,但我没有办法在那儿概括答案。
一般来说我如何才能获得所需的输出?
答案 0 :(得分:2)
一种base R
方法
out <- A[rep(seq_len(nrow(A)), A$rows_to_add + 1), ]
out
# veh age rows_to_add
#1 MINIVAN 2.5 2
#1.1 MINIVAN 2.5 2
#1.2 MINIVAN 2.5 2
#2 HEAVY TRUCK 3.5 3
#2.1 HEAVY TRUCK 3.5 3
#2.2 HEAVY TRUCK 3.5 3
#2.3 HEAVY TRUCK 3.5 3
按照@thelatemail在评论中建议的方式添加新列
out$enum <- sequence(unique(A$rows_to_add) + 1) - 1
#out <- transform(out, enum = ave(age, rows_to_add, FUN = seq_along) - 1) # my slower attempt
# veh age rows_to_add enum
#1 MINIVAN 2.5 2 0
#1.1 MINIVAN 2.5 2 1
#1.2 MINIVAN 2.5 2 2
#2 HEAVY TRUCK 3.5 3 0
#2.1 HEAVY TRUCK 3.5 3 1
#2.2 HEAVY TRUCK 3.5 3 2
#2.3 HEAVY TRUCK 3.5 3 3
使用data.table
library(data.table)
setDT(A)
out <- A[rep(seq_len(dim(A)[1]), A[, rows_to_add] + 1)
][, enum := sequence(unique(rows_to_add) + 1) - 1]
out
答案 1 :(得分:0)
您需要uncount
中的tidyr
-
library(dplyr)
library(tidyr)
A %>%
uncount(weights = rows_to_add + 1, .id = "enum") %>%
mutate(
enum = enum - 1
)
veh age rows_to_add enum
1 MINIVAN 2.5 2 0
2 MINIVAN 2.5 2 1
3 MINIVAN 2.5 2 2
4 HEAVY TRUCK 3.5 3 0
5 HEAVY TRUCK 3.5 3 1
6 HEAVY TRUCK 3.5 3 2
7 HEAVY TRUCK 3.5 3 3