Question

我想对包含在单个变量中的类别的子集求和，将其整理为r中的整洁数据。

看起来应该很简单，但是我只能想到很多代码行可以做到这一点。

这里是一个例子：

df = data.frame(food = c("carbs", "protein", "apple", "pear"), value = c(10, 12, 4, 3))
df
     food value
1   carbs    10
2 protein    12
3   apple     4
4    pear     3

我希望数据框看起来像这样（将苹果和梨合并为水果）：

     food value
1   carbs    10
2 protein    12
3   fruit     7

我想到的方式是：

library(dplyr)
library(tidyr)

df %>%
spread(key = "food", value = "value") %>%
mutate(fruit = apple + pear) %>%
select(-c(apple, pear)) %>%
gather(key = "food", value = "value")

     food value
1   carbs    10
2 protein    12
3   fruit     7

对于这么简单的事情来说，这似乎太久了。我还可以对数据进行子集处理，对行求和，然后进行rbind，但这似乎很麻烦。

有更快的选择吗？

Answer 1

因子可以用forcats::fct_recode重新编码，但这不一定要短。

library(dplyr)
library(forcats)

df %>%
  mutate(food = fct_recode(food, fruit = 'apple', fruit = 'pear')) %>%
  group_by(food) %>%
  summarise(value = sum(value))
## A tibble: 3 x 2
#  food    value
#  <fct>   <dbl>
#1 fruit       7
#2 carbs      10
#3 protein    12

编辑。

我将在此处将代码发布在this comment中，因为注释比答案更经常被删除。结果与上面相同。

df %>%
  group_by(food = fct_recode(food, fruit = 'apple', fruit = 'pear')) %>%
  summarise(value = sum(value))

Answer 2

那又怎么样：

df %>%
 group_by(food = if_else(food %in% c("apple", "pear"), "fruit", food)) %>%
 summarise_all(sum)

  food    value
  <chr>   <dbl>
1 carbs      10
2 fruit       7
3 protein    12

整洁数据r

2 个答案: