我想对包含在单个变量中的类别的子集求和,将其整理为r中的整洁数据。
看起来应该很简单,但是我只能想到很多代码行可以做到这一点。
这里是一个例子:
df = data.frame(food = c("carbs", "protein", "apple", "pear"), value = c(10, 12, 4, 3))
df
food value
1 carbs 10
2 protein 12
3 apple 4
4 pear 3
我希望数据框看起来像这样(将苹果和梨合并为水果):
food value
1 carbs 10
2 protein 12
3 fruit 7
我想到的方式是:
library(dplyr)
library(tidyr)
df %>%
spread(key = "food", value = "value") %>%
mutate(fruit = apple + pear) %>%
select(-c(apple, pear)) %>%
gather(key = "food", value = "value")
food value
1 carbs 10
2 protein 12
3 fruit 7
对于这么简单的事情来说,这似乎太久了。我还可以对数据进行子集处理,对行求和,然后进行rbind,但这似乎很麻烦。
有更快的选择吗?
答案 0 :(得分:2)
因子可以用forcats::fct_recode
重新编码,但这不一定要短。
library(dplyr)
library(forcats)
df %>%
mutate(food = fct_recode(food, fruit = 'apple', fruit = 'pear')) %>%
group_by(food) %>%
summarise(value = sum(value))
## A tibble: 3 x 2
# food value
# <fct> <dbl>
#1 fruit 7
#2 carbs 10
#3 protein 12
编辑。
我将在此处将代码发布在this comment中,因为注释比答案更经常被删除。结果与上面相同。
df %>%
group_by(food = fct_recode(food, fruit = 'apple', fruit = 'pear')) %>%
summarise(value = sum(value))
答案 1 :(得分:1)
那又怎么样:
df %>%
group_by(food = if_else(food %in% c("apple", "pear"), "fruit", food)) %>%
summarise_all(sum)
food value
<chr> <dbl>
1 carbs 10
2 fruit 7
3 protein 12