我有一个如下数据框:
Year Category1 Category2 Value
1990 A X 5
1990 B X 4
1990 A Y 3
1990 B Y 1
1990 A Z 4
1990 B Z 2
1991 A X 3
1991 B X 2
1991 A Y 8
...
我想通过将“值”列求和成一个新观察值,同时保留Year和Category2组来合并Category2中的观察值X和Y:
Year Category1 Category2 Value
1990 A X+Y 8
1990 A Z 4
1990 B Z 2
1990 B X+Y 5
1991 A X+Y 11
...
答案 0 :(得分:3)
假设这些是Category2
中唯一的值:
df <-data.frame(
Year = c(rep(1990, 6), rep(1991,3)),
Category1 = c("A","B", "A", "B", "A","B","A","B","A"),
Category2 = c("X","X","Y","Y","Z","Z","X","X","Y"),
Value = c(5,4,3,1,4,2,3,2,8)
)
df %>%
mutate(Category2 = ifelse(Category2 == "Z", "Z", "X+Y")) %>%
group_by(Year, Category1, Category2) %>%
summarise(Value = sum(Value))
# A tibble: 6 x 4
# Groups: Year, Category1 [4]
Year Category1 Category2 Value
<dbl> <fct> <chr> <dbl>
1 1990 A X+Y 8
2 1990 A Z 4
3 1990 B X+Y 5
4 1990 B Z 2
5 1991 A X+Y 11
6 1991 B X+Y 2
答案 1 :(得分:1)
如果Year
等于Category1
或Category2
,则可以用X
,Y
和一个临时逻辑变量进行汇总。之后需要进行一些清理,但可以获得所需的结果。
library(dplyr)
df %>%
group_by(Year, Category1, temp = Category2 %in% c("X", "Y")) %>%
summarise(Category2 = paste(Category2, collapse = "+"),
Value = sum(Value)) %>%
select(-temp) %>%
filter(!Category2 %in% c("X", "Y"))
# A tibble: 5 x 4
# Groups: Year, Category1 [3]
Year Category1 Category2 Value
<int> <fct> <chr> <int>
1 1990 A Z 4
2 1990 A X+Y 8
3 1990 B Z 2
4 1990 B X+Y 5
5 1991 A X+Y 11
答案 2 :(得分:1)
我会搭载mfidino。如果X,Y和Z之外还有其他值,则可以使用
Category2 %in% c('X', 'Y')
如下所示:
df <- tribble(
~Year, ~Category1, ~Category2, ~Value,
1990, 'A', 'X', 5,
1990, 'B', 'X', 4,
1990, 'A', 'Y', 3,
1990, 'B', 'Y', 1,
1990, 'A', 'Z', 4,
1990, 'B', 'Z', 2,
1991, 'A', 'X', 3,
1991, 'B', 'X', 2,
1991, 'A', 'Y', 8
)
df %>%
mutate(
Category2 = if_else(Category2 %in% c('X', 'Y'), 'X+Y', Category2)
) %>%
group_by(Year, Category1, Category2) %>%
summarise(
Value = sum(Value)
)
# A tibble: 6 x 4
# Groups: Year, Category1 [4]
Year Category1 Category2 Value
<dbl> <chr> <chr> <dbl>
1 1990 A X+Y 8
2 1990 A Z 4
3 1990 B X+Y 5
4 1990 B Z 2
5 1991 A X+Y 11
6 1991 B X+Y 2
答案 3 :(得分:1)
或者这也应该起作用:
library(dplyr)
df %>%
spread(Category2,Value, fill = 0) %>%
mutate("X+Y" = X+Y) %>%
select(-X,-Y) %>%
gather(Category2,Value,-Year,-Category1) %>%
group_by(Year,Category1,Category2) %>%
summarise(Value = sum(Value, na.rm = TRUE))