部分总结两个数据帧

时间:2019-01-30 15:38:16

标签: r

我有两个数据框。对于df1的某些行,df2中有一个匹配的行。现在,应该对df1的某些列进行操作,以便它们包含自己的值和df2中的等效值的总和。

在下面的示例中,应汇总“ count1”和“ count2”列,而不是“ type”列。

df1 <- data.frame(id = c("one_a", "two_a", "three_a", "four_a"), type = c(8,7,6,5), count1 = c(1,2,1,NA), count2 = c(NA,0,1,0), id_df2 = c("one", "two", "three", "four"))
df2 <- data.frame(id = c("one", "two", "four"), type = c(8,7,5), count1 = c(0,1,1), count2 = c(0,0,1))
result <- data.frame(id = c("one_a", "two_a", "three_a", "four_a"), type = c(8,7,6,5), count1 = c(1,3,1,1), count2 = c(0,0,1,1))

> df1
       id type count1 count2 id_df2
1   one_a    8      1     NA     one
2   two_a    7      2      0     two
3 three_a    6      1      1   three
4  four_a    5     NA      0    four

> df2
    id type count1 count2
1  one    8      0      0
2  two    7      1      0
3 four    5      1      1

> result
       id type count1 count2
1   one_a    8      1      0
2   two_a    7      3      0
3 three_a    6      1      1
4  four_a    5      1      1

也有类似的问题,我试图通过将数据帧分开并随后合并来找到解决方案。我只是想知道是否有更优雅的方法可以做到这一点。我的原始数据集大约有300列,因此我正在寻找可扩展的解决方案。

预先感谢 查克莫里斯

2 个答案:

答案 0 :(得分:1)

您可以这样做:

library(dplyr)

df1 %>% select(-id_df2) %>%
  bind_rows(df2) %>%
  mutate(id = gsub("_.*", "", id)) %>%
  replace(., is.na(.), 0) %>%
  group_by(id, type) %>%
  summarise_at(vars(contains("count")), funs(sum))

输出为:

# A tibble: 4 x 4
# Groups:   id [?]
  id     type count1 count2
  <chr> <dbl>  <dbl>  <dbl>
1 four      5      1      1
2 one       8      1      0
3 three     6      1      1
4 two       7      3      0

而且:

df1 %>% select(-id_df2) %>%
  bind_rows(df2) %>%
  mutate(id = ifelse(grepl("_", id), id, paste0(id, "_a"))) %>%
  replace(., is.na(.), 0) %>%
  group_by(id, type) %>%
  summarise_at(vars(contains("count")), funs(sum))

如果您有兴趣保留_a部分。

另一种方法是使用联接,将其转换为long,然后再向后扩展,例如:

library(tidyverse)

df1 %>% 
  left_join(df2, by = c("id_df2" = "id")) %>%
  gather(var, val, -id) %>%
  mutate(var = gsub("\\..*", "", var)) %>%
  distinct(id, var, val) %>%
  filter(!var == "id_df2") %>%
  group_by(id, var) %>%
  summarise(val = sum(as.numeric(val), na.rm = T)) %>%
  spread(var, val) 

给予:

# A tibble: 4 x 4
# Groups:   id [4]
  id      count1 count2  type
  <fct>    <dbl>  <dbl> <dbl>
1 four_a       1      1     5
2 one_a        1      0     8
3 three_a      1      1     6
4 two_a        3      0     7

如果_a结尾有特殊目的,例如也有带有_b_c等的组(在这种情况下,上述方法将失败)。

答案 1 :(得分:0)

稍微不太优雅,但仍然可以工作:

fetchUsers() {
    return this.firebase.list('users').snapshotChanges()
      .pipe(map((res) => {
        this.userList = [];
        res.forEach(element => {
          var user = element.payload.toJSON();
          user["$key"] = element.key;
          this.userList.push(user as User);
          return this.userList;
        });
      }));
  }