Question

我正在尝试创建一个新列，其中将包含对小标题的列的子集按行进行计算的结果，并将此新列添加到现有小标题中。像这样：

df <- tibble(
ID = c("one", "two", "three"),
A1 = c(1, 1, 1),
A2 = c(2, 2, 2),
A3 = c(3, 3, 3)
)

我实际上想从R的代码做dplyr的等效代码：

df$SumA <- rowSums(df[,grepl("^A", colnames(df))])

我的问题是，这并不工作：

df %>% 
select(starts_with("A")) %>% 
mutate(SumA = rowSums(.))
    # some code here

...因为我被为了让摆脱“ID”栏的发生变异碾过其他（数值）列rowSums。我试图cbind或bind_cols在发生变异后的管，但它不工作。 mutate的任何变体都不能起作用，因为它们是就地起作用的（在小节的每个像元内，即使是逐行，也不能跨列）。

这工作，但没有击中我作为一个优雅的解决方案：

df %>% 
mutate(SumA = rowSums(.[,grepl("^A", colnames(df))]))

是否有不需要基于tidyverse的解决方案，不需要grepl或方括号，而只需要更多标准的dplyr动词和参数？

我的预期的输出是这样的：

df_out <- tibble(
ID = c("one", "two", "three"),
A1 = c(1, 1, 1),
A2 = c(2, 2, 2),
A3 = c(3, 3, 3),
SumA = c(6, 6, 6)
)

最佳 kJ

Answer 1

这是使用tidyverse在purrr::pmap中进行逐行计算的一种方法。最好与实际需要逐行运行的函数配合使用；简单添加可能会以更快的方式完成。基本上，我们使用select向pmap提供输入列表，如果需要正则表达式，我们可以使用select或starts_with之类的matches助手。 / p>

library(tidyverse)
df <- tibble(
  ID = c("one", "two", "three"),
  A1 = c(1, 1, 1),
  A2 = c(2, 2, 2),
  A3 = c(3, 3, 3)
)

df %>%
  mutate(
    SumA = pmap_dbl(
      .l = select(., starts_with("A")),
      .f = function(...) sum(...)
    )
  )
#> # A tibble: 3 x 5
#>   ID       A1    A2    A3  SumA
#>   <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 one       1     2     3     6
#> 2 two       1     2     3     6
#> 3 three     1     2     3     6

^{由reprex package（v0.2.1）于2019-01-30创建}

Answer 2

这是一种不同的方法，它不逐行移动，而是利用加法的矢量化性质以及加法转换。这样一来，就可以将+和purrr::reduce重复应用{p}

library(tidyverse)
df <- tibble(
  ID = c("one", "two", "three"),
  A1 = c(1, 1, 1),
  A2 = c(2, 2, 2),
  A3 = c(3, 3, 3)
)

df %>%
  mutate(
    SumA = reduce(
      .x = select(., starts_with("A")),
      .f = `+`
    )
  )
#> # A tibble: 3 x 5
#>   ID       A1    A2    A3  SumA
#>   <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 one       1     2     3     6
#> 2 two       1     2     3     6
#> 3 three     1     2     3     6

^{由reprex package（v0.2.1）于2019-01-30创建}

Answer 3

1）为此，请尝试使用rowSums在mutate中嵌套第二个管道，如下所示：

library(dplyr)

df %>% mutate(Sum = select(., starts_with("A")) %>% rowSums)

给予：

# A tibble: 3 x 5
  ID       A1    A2    A3   Sum
  <chr> <dbl> <dbl> <dbl> <dbl>
1 one       1     2     3     6
2 two       1     2     3     6
3 three     1     2     3     6

2）：另一种方法是将其重塑为长形，然后进行总结：

library(dplyr)
library(purrr)
library(tidyr)

df %>%
  mutate(Sum = gather(., key, value, -ID) %>% 
               group_by(., ID) %>%
               summarize(sum = sum(value)) %>%
               ungroup %>%
               pull(sum))

给予：

# A tibble: 3 x 5
  ID       A1    A2    A3   Sum
  <chr> <dbl> <dbl> <dbl> <dbl>
1 one       1     2     3     6
2 two       1     2     3     6
3 three     1     2     3     6

Answer 4

[upd]我没有注意到@Calum使用几乎相同的方法。

另一种可行的方法：

library(dplyr)
library(purrr)

dat %>%
  mutate(SumA = pmap_dbl(select(., contains('A')), sum))

数据：

# dat <- tibble(
#   ID = c("one", "two", "three"),
#   A1 = c(1, 1, 1),
#   A2 = c(2, 2, 2),
#   A3 = c(3, 3, 3)
# )

输出：

# # A tibble: 3 x 5
#   ID       A1    A2    A3  SumA
#   <chr> <dbl> <dbl> <dbl> <dbl>
# 1 one       1     2     3     6
# 2 two       1     2     3     6
# 3 three     1     2     3     6

Answer 5

您可以嵌套并在嵌套的列上使用rowSums

library(tidyverse)
df %>% nest(-ID) %>%
  mutate(SumA = map_dbl(data,rowSums)) %>%
  unnest

# # A tibble: 3 x 5
#      ID  SumA    A1    A2    A3
#   <chr> <dbl> <dbl> <dbl> <dbl>
# 1   one     6     1     2     3
# 2   two     6     1     2     3
# 3 three     6     1     2     3

或者使用pmap方法的此变体：

df %>% mutate(SumA = pmap_dbl(.[-1],sum))
# # A tibble: 3 x 5
#      ID    A1    A2    A3  SumA
#   <chr> <dbl> <dbl> <dbl> <dbl>
# 1   one     1     2     3     6
# 2   two     1     2     3     6
# 3 three     1     2     3     6

并且为了显示基数有时更容易：

df$SumA <- rowSums(df[-1])

在列的子集上对行使用mutate

5 个答案: