我如何将一组中的两个观察合并为dplyr的新观察

时间:2019-09-26 13:11:52

标签: r dplyr

我有一个如下数据框:

Year   Category1   Category2   Value
1990   A           X           5
1990   B           X           4
1990   A           Y           3
1990   B           Y           1
1990   A           Z           4
1990   B           Z           2
1991   A           X           3
1991   B           X           2
1991   A           Y           8
...

我想通过将“值”列求和成一个新观察值,同时保留Year和Category2组来合并Category2中的观察值X和Y:

Year   Category1   Category2   Value
1990   A           X+Y         8
1990   A           Z           4
1990   B           Z           2
1990   B           X+Y         5
1991   A           X+Y         11
...

4 个答案:

答案 0 :(得分:3)

假设这些是Category2中唯一的值:

df <-data.frame(
  Year = c(rep(1990, 6), rep(1991,3)),
  Category1 = c("A","B", "A", "B", "A","B","A","B","A"),
  Category2 = c("X","X","Y","Y","Z","Z","X","X","Y"),
  Value = c(5,4,3,1,4,2,3,2,8)
)


df %>% 
  mutate(Category2 = ifelse(Category2 == "Z", "Z", "X+Y")) %>% 
  group_by(Year, Category1, Category2) %>% 
  summarise(Value = sum(Value))

# A tibble: 6 x 4
# Groups:   Year, Category1 [4]
   Year Category1 Category2 Value
  <dbl> <fct>     <chr>     <dbl>
1  1990 A         X+Y           8
2  1990 A         Z             4
3  1990 B         X+Y           5
4  1990 B         Z             2
5  1991 A         X+Y          11
6  1991 B         X+Y           2

答案 1 :(得分:1)

如果Year等于Category1Category2,则可以用XY和一个临时逻辑变量进行汇总。之后需要进行一些清理,但可以获得所需的结果。

library(dplyr)

df %>%
  group_by(Year, Category1, temp = Category2 %in% c("X", "Y")) %>%
  summarise(Category2 = paste(Category2, collapse = "+"),
            Value = sum(Value)) %>%
  select(-temp) %>%
  filter(!Category2 %in% c("X", "Y"))

# A tibble: 5 x 4
# Groups:   Year, Category1 [3]
   Year Category1 Category2 Value
  <int> <fct>     <chr>     <int>
1  1990 A         Z             4
2  1990 A         X+Y           8
3  1990 B         Z             2
4  1990 B         X+Y           5
5  1991 A         X+Y          11

答案 2 :(得分:1)

我会搭载mfidino。如果X,Y和Z之外还有其他值,则可以使用

Category2 %in% c('X', 'Y')

如下所示:

df <- tribble(
  ~Year,   ~Category1,   ~Category2,   ~Value,
  1990,   'A',           'X',           5,
  1990,   'B',           'X',           4,
  1990,   'A',           'Y',           3,
  1990,   'B',           'Y',           1,
  1990,   'A',           'Z',           4,
  1990,   'B',           'Z',           2,
  1991,   'A',           'X',           3,
  1991,   'B',           'X',           2,
  1991,   'A',           'Y',           8
)

df %>% 
  mutate(
    Category2 = if_else(Category2 %in% c('X', 'Y'), 'X+Y', Category2)
  ) %>% 
  group_by(Year, Category1, Category2) %>% 
  summarise(
    Value = sum(Value)
  )
# A tibble: 6 x 4
# Groups:   Year, Category1 [4]
   Year Category1 Category2 Value
  <dbl> <chr>     <chr>     <dbl>
1  1990 A         X+Y           8
2  1990 A         Z             4
3  1990 B         X+Y           5
4  1990 B         Z             2
5  1991 A         X+Y          11
6  1991 B         X+Y           2

答案 3 :(得分:1)

或者这也应该起作用:

library(dplyr)
df %>%
  spread(Category2,Value, fill = 0) %>%
  mutate("X+Y" = X+Y) %>%
  select(-X,-Y) %>%
  gather(Category2,Value,-Year,-Category1) %>%
  group_by(Year,Category1,Category2) %>%
  summarise(Value = sum(Value, na.rm = TRUE))