我正在尝试使用dplyr
在数据框中添加新变量,但我觉得很难。
新变量应该是长度为2的行数(每行中的所有变量值)。使用apply
我会这样做:
tmp$rle = apply(tmp,1,function(x) sum(rle(x)$lengths==2))
如何使用dplyr
和mutate
执行此操作(不定义所有变量名称)?
tmp <- structure(list(X1 = c(3, 1, 1, 4, 4, 1, 3, 2, 2, 2, 1, 3, 3,
2, 3, 1, 4, 2, 3, 2), X2 = c(2, 4, 2, 2, 3, 2, 1, 1, 3, 1, 3,
1, 4, 4, 4, 1, 3, 1, 2, 1), X3 = c(2, 4, 3, 3, 3, 2, 4, 3, 4,
4, 2, 3, 3, 3, 1, 3, 1, 4, 4, 2), X4 = c(1, 3, 3, 1, 1, 3, 2,
4, 4, 1, 4, 4, 1, 1, 1, 3, 1, 3, 1, 1), X5 = c(4, 2, 4, 2, 1,
4, 1, 2, 2, 4, 3, 4, 1, 1, 4, 4, 2, 4, 4, 3), X6 = c(3, 1, 4,
3, 4, 4, 4, 1, 1, 3, 4, 2, 2, 2, 3, 2, 3, 2, 2, 3), X7 = c(4,
2, 1, 1, 2, 1, 3, 3, 3, 3, 2, 2, 4, 4, 2, 4, 4, 3, 3, 4), X8 = c(1,
3, 2, 4, 2, 3, 2, 4, 1, 2, 1, 1, 2, 3, 2, 2, 2, 1, 1, 4)), .Names = c("X1",
"X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names = c(NA,
20L), class = "data.frame")
答案 0 :(得分:2)
而不是dplyr
,您可以考虑使用RStudio最近引入的purrr
包作为dplyr
的补充,以便更好地处理向量和列表。在您的情况下,tmp
是一个数字数据框,您希望将每行视为向量。代码可能如下所示:
library(purrr)
tmp <- tmp %>% by_row(..f=function(x) sum(rle(x)$lengths==2),
.to = "rle", .collate = "cols")
答案 1 :(得分:1)
在dplyr:
tmp <- mutate(tmp, rle = apply(tmp, 1, function(x) sum(rle(x)$lengths==2)))
我很难接受这个问题,因为我不熟悉我应该从rle函数中得到什么结果。我尝试将结果与您的代码的应用版本进行比较,似乎set.seed()对于可复制性可能很重要?我能正确理解吗?
以下是我做的QA尝试:(原始 tmp 应该完全相同:我只是将行包裹在list()
和structure()
参数上。)
set.seed(1)
tmp <- structure(list(X1 = c(3, 1, 1, 4, 4, 1, 3, 2, 2, 2, 1, 3, 3, 2, 3, 1, 4, 2, 3, 2),
X2 = c(2, 4, 2, 2, 3, 2, 1, 1, 3, 1, 3, 1, 4, 4, 4, 1, 3, 1, 2, 1),
X3 = c(2, 4, 3, 3, 3, 2, 4, 3, 4, 4, 2, 3, 3, 3, 1, 3, 1, 4, 4, 2),
X4 = c(1, 3, 3, 1, 1, 3, 2, 4, 4, 1, 4, 4, 1, 1, 1, 3, 1, 3, 1, 1),
X5 = c(4, 2, 4, 2, 1, 4, 1, 2, 2, 4, 3, 4, 1, 1, 4, 4, 2, 4, 4, 3),
X6 = c(3, 1, 4, 3, 4, 4, 4, 1, 1, 3, 4, 2, 2, 2, 3, 2, 3, 2, 2, 3),
X7 = c(4, 2, 1, 1, 2, 1, 3, 3, 3, 3, 2, 2, 4, 4, 2, 4, 4, 3, 3, 4),
X8 = c(1, 3, 2, 4, 2, 3, 2, 4, 1, 2, 1, 1, 2, 3, 2, 2, 2, 1, 1, 4)),
.Names = c("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"),
row.names = c(NA, 20L), class = "data.frame")
tmpApply <- tmp
tmpApply$rle = apply(tmp, 1, function(x) sum(rle(x)$lengths==2))
tmpDplyr <- tmp %>% mutate(rle = apply(tmp, 1, function(x) sum(rle(x)$lengths==2)))
tmpApply
tmpDplyr