我有一个数据:
df_1 <- data.frame(
x = replicate(4, runif(30, 20, 100)),
y = sample(1:3, 30, replace = TRUE)
)
关注功能起作用:
library(tidyverse)
df_1 %>%
select(-y) %>%
rowwise() %>%
mutate(var = sum(c(x.1, x.3)))
但是,以下函数(对于所有变量)无法正常工作:
与.
:
df_1 %>%
select(-y) %>%
rowwise() %>%
mutate(var = sum(.))
与select_if
:
df_1 %>%
select(-y) %>%
rowwise() %>%
mutate(var = sum(select_if(., is.numeric)))
两个方法都返回:
Source: local data frame [30 x 5]
Groups: <by row>
# A tibble: 30 x 5
x.1 x.2 x.3 x.4 var
<dbl> <dbl> <dbl> <dbl> <dbl>
1 32.7 42.7 50.1 20.8 7091.
2 75.9 71.3 83.6 77.6 7091.
3 49.6 28.7 97.0 59.7 7091.
4 47.4 96.1 31.9 79.7 7091.
5 54.2 47.1 81.7 41.6 7091.
6 27.9 58.1 97.4 25.9 7091.
7 61.8 78.3 52.6 67.7 7091.
8 85.4 51.3 38.8 82.0 7091.
9 27.9 72.6 68.9 25.2 7091.
10 87.2 42.1 27.6 73.9 7091.
# ... with 20 more rows
7091
是不正确的金额。
如何调整此功能?
答案 0 :(得分:2)
这可以使用purrr::pmap
来完成,该方法将参数列表传递给接受“点”的函数。由于mean
,sd
等大多数函数都可以使用引导程序,因此您需要将呼叫与domain lifter配对:
df_1 %>% select(-y) %>% mutate( var = pmap(., lift_vd(mean)) )
# x.1 x.2 x.3 x.4 var
# 1 70.12072 62.99024 54.00672 86.81358 68.48282
# 2 49.40462 47.00752 21.99248 78.87789 49.32063
df_1 %>% select(-y) %>% mutate( var = pmap(., lift_vd(sd)) )
# x.1 x.2 x.3 x.4 var
# 1 70.12072 62.99024 54.00672 86.81358 13.88555
# 2 49.40462 47.00752 21.99248 78.87789 23.27958
函数sum
直接接受点,因此您不需要提升其域:
df_1 %>% select(-y) %>% mutate( var = pmap(., sum) )
# x.1 x.2 x.3 x.4 var
# 1 70.12072 62.99024 54.00672 86.81358 273.9313
# 2 49.40462 47.00752 21.99248 78.87789 197.2825
所有内容都符合标准的dplyr
数据处理,因此可以将所有三个内容合并为mutate
的单独参数:
df_1 %>% select(-y) %>%
mutate( v1 = pmap(., lift_vd(mean)),
v2 = pmap(., lift_vd(sd)),
v3 = pmap(., sum) )
# x.1 x.2 x.3 x.4 v1 v2 v3
# 1 70.12072 62.99024 54.00672 86.81358 68.48282 13.88555 273.9313
# 2 49.40462 47.00752 21.99248 78.87789 49.32063 23.27958 197.2825
答案 1 :(得分:1)
我认为这很棘手,因为mutate的范围变体(mutate_at
,mutate_all
,mutate_if
)通常旨在在特定列上执行函数,而不是创建操作使用所有列。
我能想到的最简单的解决方案基本上是创建一个向量(cols
),然后将其用于执行汇总操作:
library(dplyr)
library(purrr)
df_1 <- data.frame(
x = replicate(4, runif(30, 20, 100)),
y = sample(1:3, 30, replace = TRUE)
)
# create vector of columns to operate on
cols <- names(df_1)
cols <- cols[map_lgl(df_1, is.numeric)]
cols <- cols[! cols %in% c("y")]
cols
#> [1] "x.1" "x.2" "x.3" "x.4"
df_1 %>%
select(-y) %>%
rowwise() %>%
mutate(
var = sum(!!!map(cols, as.name), na.rm = TRUE)
)
#> Source: local data frame [30 x 5]
#> Groups: <by row>
#>
#> # A tibble: 30 x 5
#> x.1 x.2 x.3 x.4 var
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 46.1 28.9 28.9 50.7 155.
#> 2 26.8 68.0 67.1 26.5 188.
#> 3 35.2 63.8 62.5 28.5 190.
#> 4 31.3 44.9 67.3 68.2 212.
#> 5 52.6 23.9 83.2 43.4 203.
#> 6 55.7 92.8 86.3 57.2 292.
#> 7 56.9 50.0 77.6 25.6 210.
#> 8 95.0 82.6 86.1 22.7 286.
#> 9 62.7 26.5 61.0 88.9 239.
#> 10 65.2 23.1 25.5 51.0 165.
#> # … with 20 more rows
由reprex package(v0.2.1)于2019-04-30创建
注意:如果您不熟悉purrr
,也可以使用类似lapply
之类的东西。
您可以在此处详细了解这些类型的更复杂的dplyr
操作(!!
,!!!
等)
答案 2 :(得分:1)
我过去采用的几种方法:
rowSums
)reduce
(并非适用于所有功能)pmap
的自定义函数set.seed(1)
df_1 <- data.frame(
x = replicate(4, runif(30, 20, 100)),
y = sample(1:3, 30, replace = TRUE)
)
library(tidyverse)
# rowSums
df_1 %>%
mutate(var = rowSums(select(., -y))) %>%
head()
#> x.1 x.2 x.3 x.4 y var
#> 1 41.24069 58.56641 93.03007 39.17035 3 232.0075
#> 2 49.76991 67.96527 43.48827 24.71475 2 185.9382
#> 3 65.82827 59.48330 56.72526 71.38306 2 253.4199
#> 4 92.65662 34.89741 46.59157 90.10154 1 264.2471
#> 5 36.13455 86.18987 72.06964 82.31317 3 276.7072
#> 6 91.87117 73.47734 40.64134 83.78471 2 289.7746
df_1 %>%
mutate(var = reduce(select(., -y),`+`)) %>%
head()
#> x.1 x.2 x.3 x.4 y var
#> 1 41.24069 58.56641 93.03007 39.17035 3 232.0075
#> 2 49.76991 67.96527 43.48827 24.71475 2 185.9382
#> 3 65.82827 59.48330 56.72526 71.38306 2 253.4199
#> 4 92.65662 34.89741 46.59157 90.10154 1 264.2471
#> 5 36.13455 86.18987 72.06964 82.31317 3 276.7072
#> 6 91.87117 73.47734 40.64134 83.78471 2 289.7746
df_1 %>%
mutate(var = select(., -y) %>% as.matrix %>% t %>% as.data.frame %>% map_dbl(var)) %>%
head()
#> x.1 x.2 x.3 x.4 y var
#> 1 41.24069 58.56641 93.03007 39.17035 3 620.95228
#> 2 49.76991 67.96527 43.48827 24.71475 2 318.37221
#> 3 65.82827 59.48330 56.72526 71.38306 2 43.17011
#> 4 92.65662 34.89741 46.59157 90.10154 1 878.50087
#> 5 36.13455 86.18987 72.06964 82.31317 3 520.72241
#> 6 91.87117 73.47734 40.64134 83.78471 2 506.16785
pmap
的自定义功能my_var <- function(...){
vec <- c(...)
var(vec)
}
df_1 %>%
mutate(var = select(., -y) %>% pmap(my_var)) %>%
head()
#> x.1 x.2 x.3 x.4 y var
#> 1 41.24069 58.56641 93.03007 39.17035 3 620.9523
#> 2 49.76991 67.96527 43.48827 24.71475 2 318.3722
#> 3 65.82827 59.48330 56.72526 71.38306 2 43.17011
#> 4 92.65662 34.89741 46.59157 90.10154 1 878.5009
#> 5 36.13455 86.18987 72.06964 82.31317 3 520.7224
#> 6 91.87117 73.47734 40.64134 83.78471 2 506.1679
由reprex package(v0.2.1)于2019-04-30创建
答案 3 :(得分:1)
这是一个棘手的问题,因为dplyr对于许多操作都按列进行操作。我最初使用基数R中的apply
来应用于行,但是apply
是DataFrame.T
。
相反,我们可以使用(老化)的 plyr 和problematic when handling character and numeric types简单地做到这一点:
df_1 %>% select(-y) %>% adply(1, function(df) c(v1 = sd(df[1, ])))
请注意,var
之类的某些功能无法在单行数据帧上运行,因此我们需要使用as.numeric
转换为矢量。