dplyr:添加变量作为每行中所有变量的函数

时间:2016-06-15 13:16:49

标签: r dplyr

我正在尝试使用dplyr在数据框中添加新变量,但我觉得很难。

新变量应该是长度为2的行数(每行中的所有变量值)。使用apply我会这样做:

tmp$rle = apply(tmp,1,function(x) sum(rle(x)$lengths==2))

如何使用dplyrmutate执行此操作(不定义所有变量名称)?

tmp <- structure(list(X1 = c(3, 1, 1, 4, 4, 1, 3, 2, 2, 2, 1, 3, 3, 
2, 3, 1, 4, 2, 3, 2), X2 = c(2, 4, 2, 2, 3, 2, 1, 1, 3, 1, 3, 
1, 4, 4, 4, 1, 3, 1, 2, 1), X3 = c(2, 4, 3, 3, 3, 2, 4, 3, 4, 
4, 2, 3, 3, 3, 1, 3, 1, 4, 4, 2), X4 = c(1, 3, 3, 1, 1, 3, 2, 
4, 4, 1, 4, 4, 1, 1, 1, 3, 1, 3, 1, 1), X5 = c(4, 2, 4, 2, 1, 
4, 1, 2, 2, 4, 3, 4, 1, 1, 4, 4, 2, 4, 4, 3), X6 = c(3, 1, 4, 
3, 4, 4, 4, 1, 1, 3, 4, 2, 2, 2, 3, 2, 3, 2, 2, 3), X7 = c(4, 
2, 1, 1, 2, 1, 3, 3, 3, 3, 2, 2, 4, 4, 2, 4, 4, 3, 3, 4), X8 = c(1, 
3, 2, 4, 2, 3, 2, 4, 1, 2, 1, 1, 2, 3, 2, 2, 2, 1, 1, 4)), .Names = c("X1", 
"X2", "X3", "X4", "X5", "X6", "X7", "X8"), row.names = c(NA, 
20L), class = "data.frame")

2 个答案:

答案 0 :(得分:2)

而不是dplyr,您可以考虑使用RStudio最近引入的purrr包作为dplyr的补充,以便更好地处理向量和列表。在您的情况下,tmp是一个数字数据框,您希望将每行视为向量。代码可能如下所示:

library(purrr)
tmp <- tmp %>% by_row(..f=function(x) sum(rle(x)$lengths==2), 
                      .to = "rle", .collate = "cols")

答案 1 :(得分:1)

在dplyr:

tmp <- mutate(tmp, rle = apply(tmp, 1, function(x) sum(rle(x)$lengths==2)))

我很难接受这个问题,因为我不熟悉我应该从rle函数中得到什么结果。我尝试将结果与您的代码的应用版本进行比较,似乎set.seed()对于可复制性可能很重要?我能正确理解吗?

以下是我做的QA尝试:(原始 tmp 应该完全相同:我只是将行包裹在list()structure()参数上。)

set.seed(1)
tmp <- structure(list(X1 = c(3, 1, 1, 4, 4, 1, 3, 2, 2, 2, 1, 3, 3, 2, 3, 1, 4, 2, 3, 2),
                      X2 = c(2, 4, 2, 2, 3, 2, 1, 1, 3, 1, 3, 1, 4, 4, 4, 1, 3, 1, 2, 1),
                      X3 = c(2, 4, 3, 3, 3, 2, 4, 3, 4, 4, 2, 3, 3, 3, 1, 3, 1, 4, 4, 2),
                      X4 = c(1, 3, 3, 1, 1, 3, 2, 4, 4, 1, 4, 4, 1, 1, 1, 3, 1, 3, 1, 1),
                      X5 = c(4, 2, 4, 2, 1, 4, 1, 2, 2, 4, 3, 4, 1, 1, 4, 4, 2, 4, 4, 3),
                      X6 = c(3, 1, 4, 3, 4, 4, 4, 1, 1, 3, 4, 2, 2, 2, 3, 2, 3, 2, 2, 3),
                      X7 = c(4, 2, 1, 1, 2, 1, 3, 3, 3, 3, 2, 2, 4, 4, 2, 4, 4, 3, 3, 4),
                      X8 = c(1, 3, 2, 4, 2, 3, 2, 4, 1, 2, 1, 1, 2, 3, 2, 2, 2, 1, 1, 4)),
                 .Names = c("X1", "X2", "X3", "X4", "X5", "X6", "X7", "X8"), 
                 row.names = c(NA, 20L), class = "data.frame")
tmpApply <- tmp
tmpApply$rle = apply(tmp, 1, function(x) sum(rle(x)$lengths==2))
tmpDplyr <- tmp %>% mutate(rle = apply(tmp, 1, function(x) sum(rle(x)$lengths==2))) 

tmpApply            
tmpDplyr